The official TypeScript/JavaScript SDK for the PDF Vector API: Convert PDF and Word documents to clean, structured markdown format with optional AI enhancement, search across multiple academic databases with a unified API, and fetch specific publications by DOI, PubMed ID, ArXiv ID, and more.
npm install pdfvector
# or
yarn add pdfvector
# or
pnpm add pdfvector
# or
bun add pdfvector
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
// Parse from document URL or data
const parseResult = await client.parse({
url: "https://example.com/document.pdf",
useLLM: "auto",
});
console.log(parseResult.markdown); // Return clean markdown
console.log(
`Pages: ${parseResult.pageCount}, Credits: ${parseResult.creditCount}`,
);
Get your API key from the PDF Vector dashboard. The SDK requires a valid API key for all operations.
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const result = await client.parse({
url: "https://arxiv.org/pdf/2301.00001.pdf",
useLLM: "auto",
});
console.log(result.markdown);
import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const result = await client.parse({
data: await readFile("document.pdf"),
contentType: "application/pdf",
useLLM: "auto",
});
console.log(result.markdown);
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const searchResponse = await client.academicSearch({
query: "quantum computing",
providers: ["semantic-scholar", "arxiv", "pubmed"], // Search across multiple academic databases
limit: 20,
yearFrom: 2021,
yearTo: 2024,
});
searchResponse.results.forEach((publication) => {
console.log(`Title: ${publication.title}`);
console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
console.log(`Year: ${publication.year}`);
console.log(`Abstract: ${publication.abstract}`);
console.log("---");
});
const searchResponse = await client.academicSearch({
query: "CRISPR gene editing",
providers: ["semantic-scholar"],
fields: ["title", "authors", "year", "providerData"], //providerData is Provider-Specific data field
});
searchResponse.results.forEach((pub) => {
if (pub.provider === "semantic-scholar" && pub.providerData) {
const data = pub.providerData;
console.log(`Influential Citations: ${data.influentialCitationCount}`);
console.log(`Fields of Study: ${data.fieldsOfStudy?.join(", ")}`);
}
});
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const response = await client.academicFetch({
ids: [
"10.1038/nature12373", // DOI
"12345678", // PubMed ID
"2301.00001", // ArXiv ID
"arXiv:2507.16298v1", // ArXiv with prefix
"ED123456", // ERIC ID
"0f40b1f08821e22e859c6050916cec3667778613", // Semantic Scholar ID
],
fields: ["title", "authors", "year", "abstract", "doi"], // Optional: specify fields
});
// Handle successful results
response.results.forEach((pub) => {
console.log(`Title: ${pub.title}`);
console.log(`Provider: ${pub.detectedProvider}`);
console.log(`Requested as: ${pub.id}`);
});
// Handle errors for IDs that couldn't be fetched
response.errors?.forEach((error) => {
console.log(`Failed to fetch ${error.id}: ${error.error}`);
});
import { PDFVector, PDFVectorError } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
try {
const result = await client.parse({
url: "https://example.com/document.pdf",
});
console.log(result.markdown);
} catch (error) {
if (error instanceof PDFVectorError) {
console.error(`API Error: ${error.message}`);
console.error(`Status: ${error.status}`);
console.error(`Code: ${error.code}`);
} else {
console.error("Unexpected Error:", error);
}
}
The client class for interacting with the PDF Vector API.
new PDFVector(config: PDFVectorConfig)
Parameters:
-
config.apiKey
(string): Your PDF Vector API key -
config.baseUrl
(string, optional): Custom base URL (defaults tohttps://www.pdfvector.com
)
Parse a PDF or Word document and convert it to markdown.
Parameters:
For URL parsing:
{
url: string; // Direct URL to PDF/Word document
useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}
For data parsing:
{
data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Direct data of PDF/Word document
contentType: string; // MIME type (e.g., 'application/pdf')
useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}
Returns:
{
markdown: string; // Extracted content as markdown
pageCount: number; // Number of pages processed
creditCount: number; // Credits consumed (1-2 per page)
usedLLM: boolean; // Whether AI enhancement was used
}
-
auto
(default): Automatically decide if AI enhancement is needed (1-2 credits per page) -
never
: Standard parsing without AI (1 credit per page) -
always
: Force AI enhancement (2 credits per page)
Note: Free plans are limited to useLLM: 'never'
. Upgrade to a paid plan for AI enhancement.
application/pdf
application/x-pdf
application/acrobat
application/vnd.pdf
text/pdf
text/x-pdf
-
application/msword
(.doc) -
application/vnd.openxmlformats-officedocument.wordprocessingml.document
(.docx)
- Processing timeout: 3 minutes per document
- File size: No explicit limit, but larger files usually have more pages and consume more credits
- Credits: Consumed per page (1-2 credits depending on LLM usage)
-
url-not-found
: Document URL not accessible -
unsupported-content-type
: File type not supported -
timeout-error
: Processing timeout (3 minutes max) -
payment-required
: Usage limit reached
Search academic publications across multiple databases.
Parameters:
{
query: string; // Search query
providers?: AcademicSearchProvider[]; // Databases to search (default: ["semantic-scholar"])
offset?: number; // Pagination offset (default: 0)
limit?: number; // Results per page, 1-100 (default: 20)
yearFrom?: number; // Filter by publication year (from) (min: 1900)
yearTo?: number; // Filter by publication year (to) (max: 2050)
fields?: AcademicSearchPublicationField[]; // Fields to include in response
}
Supported Providers:
-
"semantic-scholar"
- Semantic Scholar -
"arxiv"
- ArXiv -
"pubmed"
- PubMed -
"google-scholar"
- Google Scholar -
"eric"
- ERIC
Available Fields:
- Basic fields:
"id"
,"doi"
,"title"
,"url"
,"providerURL"
,"authors"
,"date"
,"year"
,"totalCitations"
,"totalReferences"
,"abstract"
,"pdfURL"
,"provider"
- Extended field:
"providerData"
- Provider-specific metadata
Returns:
{
estimatedTotalResults: number; // Total results available
results: AcademicSearchPublication[]; // Array of publications
errors?: AcademicSearchProviderError[]; // Any provider errors
}
- Credits: 2 credits per search.
Fetch specific academic publications by their IDs with automatic provider detection.
Parameters:
{
ids: string[]; // Array of publication IDs to fetch
fields?: AcademicSearchPublicationField[]; // Fields to include in response
}
Supported ID Types:
-
DOI: e.g.,
"10.1038/nature12373"
-
PubMed ID: e.g.,
"12345678"
(numeric ID) -
ArXiv ID: e.g.,
"2301.00001"
or"arXiv:2301.00001"
or"math.GT/0309136"
-
Semantic Scholar ID: e.g.,
"0f40b1f08821e22e859c6050916cec3667778613"
-
ERIC ID: e.g.,
"ED123456"
Returns:
{
results: AcademicFetchResult[]; // Successfully fetched publications
errors?: AcademicFetchError[]; // Errors for IDs that couldn't be fetched
}
Each result includes:
{
id: string; // The ID that was used to fetch
detectedProvider: string; // Provider that was used
// ... all publication fields (title, authors, abstract, etc.)
}
- Credits: 2 credit per fetch.
The SDK is written in TypeScript and includes full type definitions:
import type {
// Core classes
PDFVector,
PDFVectorConfig,
PDFVectorError,
// Parse API types
ParseURLRequest,
ParseDataRequest,
ParseResponse,
// Academic Search API types
SearchRequest,
AcademicSearchResponse,
AcademicSearchPublication,
AcademicSearchProvider,
AcademicSearchAuthor,
AcademicSearchPublicationField,
// Academic Fetch API types
FetchRequest,
AcademicFetchResponse,
AcademicFetchResult,
AcademicFetchError,
// Provider-specific data types
AcademicSearchSemanticScholarData,
AcademicSearchGoogleScholarData,
AcademicSearchPubMedData,
AcademicSearchArxivData,
AcademicSearchEricData,
} from "pdfvector";
// Constants
import {
AcademicSearchProviderValues, // Array of valid providers
AcademicSearchPublicationFieldValues, // Array of valid fields
} from "pdfvector";
- Node.js version: Node.js 20+
- ESM: Supports ES modules (CommonJS is not supported)
-
Dependencies: Uses standard
fetch
API
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const documents = [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf",
];
const results = await Promise.all(
documents.map((url) => client.parse({ url, useLLM: "auto" })),
);
results.forEach((result, index) => {
console.log(`Document ${index + 1}:`);
console.log(`Pages: ${result.pageCount}`);
console.log(`Credits: ${result.creditCount}`);
});
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
let offset = 0;
const limit = 50;
const allResults = [];
// Fetch first page
let response = await client.academicSearch({
query: "climate change",
providers: ["semantic-scholar", "arxiv"],
offset,
limit,
});
allResults.push(...response.results);
// Fetch more pages as needed
while (
allResults.length < response.estimatedTotalResults &&
allResults.length < 200
) {
offset += limit;
response = await client.academicSearch({
query: "climate change",
providers: ["semantic-scholar", "arxiv"],
offset,
limit,
});
allResults.push(...response.results);
}
console.log(`Fetched ${allResults.length} publications`);
// For development or custom deployments
const client = new PDFVector({
apiKey: "pdfvector_api_key_here",
baseUrl: "https://pdfvector.acme.com",
});
- API Reference (Scalar): pdfvector.com/v1/api/scalar
- API Reference (Swagger): pdfvector.com/v1/api/swagger
- Dashboard: pdfvector.com/dashboard
This SDK is licensed under the MIT License.