@krakz999/tabula-node
TypeScript icon, indicating that this package has built-in type declarations

1.0.6 • Public • Published

@krakz999/tabula-node

A package for extracting tables from PDF files. The package is a wrapper completely written in Typescript, built on top of the popular tabula-java library.

Installation

To use this package, install it via your favourite package manager:

pnpm i @krakz999/tabula-node

Usage

import { extractTables } from '@krakz999/tabula-node';

const results = await extractTables("./test.pdf", {
    pages: "all",
    guess: true
});

console.log(results);

Options

The extractTables function accepts an options object with the following properties:

  • area: Portion of the page to analyze. Example: "269.875,12.75,790.5,561". Accepts top,left,bottom,right i.e. y1,x1,y2,x2 where all values are in points relative to the top left corner. If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual height or width of the page. Example: "%0,0,100,50". To specify multiple areas, pass an array. Default is entire page.
  • columns: X coordinates of column boundaries. Example "10.1,20.2,30.3". If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual width of the page. Example: "%25,50,80.6"
  • format: Output format: (CSV,TSV,JSON). Default: CSV
  • guess: Guess the portion of the page to analyze per page.
  • lattice: Force PDF to be extracted using lattice-mode extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
  • noSpreadsheet: [Deprecated in favor of -t/--stream] Force PDF not to be extracted using spreadsheet-style extraction (if there are no ruling lines separating each cell)
  • pages: Comma separated list of ranges, or all. Examples: "1-3,5-7", --pages 3 or "all". Default is "1"
  • password: Password to decrypt document. Default is empty
  • stream: Force PDF to be extracted using stream-mode extraction (if there are no ruling lines separating each cell)
  • useLineReturns: Use embedded line returns in cells. (Only in spreadsheet mode.)

Package Sidebar

Install

npm i @krakz999/tabula-node

Weekly Downloads

1

Version

1.0.6

License

MIT

Unpacked Size

13.4 MB

Total Files

15

Last publish

Collaborators

  • krakz