pdfvector
TypeScript icon, indicating that this package has built-in type declarations

1.4.0 • Public • Published

PDF Vector TypeScript/JavaScript SDK

The official TypeScript/JavaScript SDK for the PDF Vector API: Convert PDF and Word documents to clean, structured markdown format with optional AI enhancement, search across multiple academic databases with a unified API, and fetch specific publications by DOI, PubMed ID, ArXiv ID, and more.

Installation

npm install pdfvector
# or
yarn add pdfvector
# or
pnpm add pdfvector
# or
bun add pdfvector

Quick Start

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

// Parse from document URL or data
const parseResult = await client.parse({
  url: "https://example.com/document.pdf",
  useLLM: "auto",
});

console.log(parseResult.markdown); // Return clean markdown
console.log(
  `Pages: ${parseResult.pageCount}, Credits: ${parseResult.creditCount}`,
);

Authentication

Get your API key from the PDF Vector dashboard. The SDK requires a valid API key for all operations.

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

Usage Examples

Parse from URL

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  url: "https://arxiv.org/pdf/2301.00001.pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Parse from data

import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  data: await readFile("document.pdf"),
  contentType: "application/pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Search academic publications

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const searchResponse = await client.academicSearch({
  query: "quantum computing",
  providers: ["semantic-scholar", "arxiv", "pubmed"], // Search across multiple academic databases
  limit: 20,
  yearFrom: 2021,
  yearTo: 2024,
});

searchResponse.results.forEach((publication) => {
  console.log(`Title: ${publication.title}`);
  console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
  console.log(`Year: ${publication.year}`);
  console.log(`Abstract: ${publication.abstract}`);
  console.log("---");
});

Search with Provider-Specific Data

const searchResponse = await client.academicSearch({
  query: "CRISPR gene editing",
  providers: ["semantic-scholar"],
  fields: ["title", "authors", "year", "providerData"], //providerData is Provider-Specific data field
});

searchResponse.results.forEach((pub) => {
  if (pub.provider === "semantic-scholar" && pub.providerData) {
    const data = pub.providerData;
    console.log(`Influential Citations: ${data.influentialCitationCount}`);
    console.log(`Fields of Study: ${data.fieldsOfStudy?.join(", ")}`);
  }
});

Fetch Academic Publications by ID

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const response = await client.academicFetch({
  ids: [
    "10.1038/nature12373", // DOI
    "12345678", // PubMed ID
    "2301.00001", // ArXiv ID
    "arXiv:2507.16298v1", // ArXiv with prefix
    "ED123456", // ERIC ID
    "0f40b1f08821e22e859c6050916cec3667778613", // Semantic Scholar ID
  ],
  fields: ["title", "authors", "year", "abstract", "doi"], // Optional: specify fields
});

// Handle successful results
response.results.forEach((pub) => {
  console.log(`Title: ${pub.title}`);
  console.log(`Provider: ${pub.detectedProvider}`);
  console.log(`Requested as: ${pub.id}`);
});

// Handle errors for IDs that couldn't be fetched
response.errors?.forEach((error) => {
  console.log(`Failed to fetch ${error.id}: ${error.error}`);
});

Error Handling

import { PDFVector, PDFVectorError } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

try {
  const result = await client.parse({
    url: "https://example.com/document.pdf",
  });
  console.log(result.markdown);
} catch (error) {
  if (error instanceof PDFVectorError) {
    console.error(`API Error: ${error.message}`);
    console.error(`Status: ${error.status}`);
    console.error(`Code: ${error.code}`);
  } else {
    console.error("Unexpected Error:", error);
  }
}

API Reference

The client class for interacting with the PDF Vector API.

Constructor

new PDFVector(config: PDFVectorConfig)

Parameters:

  • config.apiKey (string): Your PDF Vector API key
  • config.baseUrl (string, optional): Custom base URL (defaults to https://www.pdfvector.com)

Methods

parse(request)

Parse a PDF or Word document and convert it to markdown.

Parameters:

For URL parsing:

{
  url: string;           // Direct URL to PDF/Word document
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

For data parsing:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Direct data of PDF/Word document
  contentType: string;   // MIME type (e.g., 'application/pdf')
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

Returns:

{
  markdown: string; // Extracted content as markdown
  pageCount: number; // Number of pages processed
  creditCount: number; // Credits consumed (1-2 per page)
  usedLLM: boolean; // Whether AI enhancement was used
}

LLM Usage Options

  • auto (default): Automatically decide if AI enhancement is needed (1-2 credits per page)
  • never: Standard parsing without AI (1 credit per page)
  • always: Force AI enhancement (2 credits per page)

Note: Free plans are limited to useLLM: 'never'. Upgrade to a paid plan for AI enhancement.

Supported File Types

PDF Documents
  • application/pdf
  • application/x-pdf
  • application/acrobat
  • application/vnd.pdf
  • text/pdf
  • text/x-pdf
Word Documents
  • application/msword (.doc)
  • application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx)

Usage Limits

  • Processing timeout: 3 minutes per document
  • File size: No explicit limit, but larger files usually have more pages and consume more credits

Cost

  • Credits: Consumed per page (1-2 credits depending on LLM usage)

Common error codes:

  • url-not-found: Document URL not accessible
  • unsupported-content-type: File type not supported
  • timeout-error: Processing timeout (3 minutes max)
  • payment-required: Usage limit reached

academicSearch(request)

Search academic publications across multiple databases.

Parameters:

{
  query: string;                              // Search query
  providers?: AcademicSearchProvider[];       // Databases to search (default: ["semantic-scholar"])
  offset?: number;                            // Pagination offset (default: 0)
  limit?: number;                             // Results per page, 1-100 (default: 20)
  yearFrom?: number;                          // Filter by publication year (from) (min: 1900)
  yearTo?: number;                            // Filter by publication year (to) (max: 2050)
  fields?: AcademicSearchPublicationField[];  // Fields to include in response
}

Supported Providers:

Available Fields:

  • Basic fields: "id", "doi", "title", "url", "providerURL", "authors", "date", "year", "totalCitations", "totalReferences", "abstract", "pdfURL", "provider"
  • Extended field: "providerData" - Provider-specific metadata

Returns:

{
  estimatedTotalResults: number;              // Total results available
  results: AcademicSearchPublication[];       // Array of publications
  errors?: AcademicSearchProviderError[];     // Any provider errors
}

Cost

  • Credits: 2 credits per search.

academicFetch(request) / fetch(request)

Fetch specific academic publications by their IDs with automatic provider detection.

Parameters:

{
  ids: string[];                               // Array of publication IDs to fetch
  fields?: AcademicSearchPublicationField[];   // Fields to include in response
}

Supported ID Types:

  • DOI: e.g., "10.1038/nature12373"
  • PubMed ID: e.g., "12345678" (numeric ID)
  • ArXiv ID: e.g., "2301.00001" or "arXiv:2301.00001" or "math.GT/0309136"
  • Semantic Scholar ID: e.g., "0f40b1f08821e22e859c6050916cec3667778613"
  • ERIC ID: e.g., "ED123456"

Returns:

{
  results: AcademicFetchResult[];    // Successfully fetched publications
  errors?: AcademicFetchError[];     // Errors for IDs that couldn't be fetched
}

Each result includes:

{
  id: string; // The ID that was used to fetch
  detectedProvider: string; // Provider that was used
  // ... all publication fields (title, authors, abstract, etc.)
}

Cost

  • Credits: 2 credit per fetch.

TypeScript Support

The SDK is written in TypeScript and includes full type definitions:

import type {
  // Core classes
  PDFVector,
  PDFVectorConfig,
  PDFVectorError,
  // Parse API types
  ParseURLRequest,
  ParseDataRequest,
  ParseResponse,
  // Academic Search API types
  SearchRequest,
  AcademicSearchResponse,
  AcademicSearchPublication,
  AcademicSearchProvider,
  AcademicSearchAuthor,
  AcademicSearchPublicationField,
  // Academic Fetch API types
  FetchRequest,
  AcademicFetchResponse,
  AcademicFetchResult,
  AcademicFetchError,
  // Provider-specific data types
  AcademicSearchSemanticScholarData,
  AcademicSearchGoogleScholarData,
  AcademicSearchPubMedData,
  AcademicSearchArxivData,
  AcademicSearchEricData,
} from "pdfvector";

// Constants
import {
  AcademicSearchProviderValues, // Array of valid providers
  AcademicSearchPublicationFieldValues, // Array of valid fields
} from "pdfvector";

Node.js Support

  • Node.js version: Node.js 20+
  • ESM: Supports ES modules (CommonJS is not supported)
  • Dependencies: Uses standard fetch API

Examples

Batch Processing

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const documents = [
  "https://example.com/doc1.pdf",
  "https://example.com/doc2.pdf",
];

const results = await Promise.all(
  documents.map((url) => client.parse({ url, useLLM: "auto" })),
);

results.forEach((result, index) => {
  console.log(`Document ${index + 1}:`);
  console.log(`Pages: ${result.pageCount}`);
  console.log(`Credits: ${result.creditCount}`);
});

Academic Search with Pagination

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

let offset = 0;
const limit = 50;
const allResults = [];

// Fetch first page
let response = await client.academicSearch({
  query: "climate change",
  providers: ["semantic-scholar", "arxiv"],
  offset,
  limit,
});

allResults.push(...response.results);

// Fetch more pages as needed
while (
  allResults.length < response.estimatedTotalResults &&
  allResults.length < 200
) {
  offset += limit;
  response = await client.academicSearch({
    query: "climate change",
    providers: ["semantic-scholar", "arxiv"],
    offset,
    limit,
  });
  allResults.push(...response.results);
}

console.log(`Fetched ${allResults.length} publications`);

Custom Base URL

// For development or custom deployments
const client = new PDFVector({
  apiKey: "pdfvector_api_key_here",
  baseUrl: "https://pdfvector.acme.com",
});

Support

License

This SDK is licensed under the MIT License.

Package Sidebar

Install

npm i pdfvector

Weekly Downloads

396

Version

1.4.0

License

MIT

Unpacked Size

68.8 kB

Total Files

10

Last publish

Collaborators

  • phuctm97