pdf-text-tools
TypeScript icon, indicating that this package has built-in type declarations

0.0.0-development • Public • Published

pdf-text-tools

A bunch of tools to help with processing text from a pdf, for use with LLMs. For example, finding headers, splitting text at headers, etc. Particularly useful for processing pages of text from a pdf, where the text is not structured in a way that is easy to process. and

Install

npm install pdf-text-tools

Usage

/**
 * Find header titles in a pdf using regex ish 
 */
import { findHeaderTitles } from 'pdf-text-tools';

findHeaderTitles('..some text string from pdf..');
//=> ['header1', 'header2'] 

/**
 * Split text at header titles
 *  - Usefull to grab the last bit of a page
 */ 
import { splitAtHeader } from 'pdf-text-tools';

splitAtHeader('..some text string from pdf..', "last");
//=> ['text before the header', 'text after the heading, including the header'] 

More tools coming soon!

Readme

Keywords

Package Sidebar

Install

npm i pdf-text-tools

Weekly Downloads

1

Version

0.0.0-development

License

MIT

Unpacked Size

13.2 kB

Total Files

7

Last publish

Collaborators

  • millsit