pdf-text-tools
TypeScript icon, indicating that this package has built-in type declarations

0.0.0-development • Public • Published

pdf-text-tools

A bunch of tools to help with processing text from a pdf, for use with LLMs. For example, finding headers, splitting text at headers, etc. Particularly useful for processing pages of text from a pdf, where the text is not structured in a way that is easy to process. and

Install

npm install pdf-text-tools

Usage

/**
 * Find header titles in a pdf using regex ish 
 */
import { findHeaderTitles } from 'pdf-text-tools';

findHeaderTitles('..some text string from pdf..');
//=> ['header1', 'header2'] 

/**
 * Split text at header titles
 *  - Usefull to grab the last bit of a page
 */ 
import { splitAtHeader } from 'pdf-text-tools';

splitAtHeader('..some text string from pdf..', "last");
//=> ['text before the header', 'text after the heading, including the header'] 

More tools coming soon!

/pdf-text-tools/

    Package Sidebar

    Install

    npm i pdf-text-tools

    Weekly Downloads

    7

    Version

    0.0.0-development

    License

    MIT

    Unpacked Size

    13.2 kB

    Total Files

    7

    Last publish

    Collaborators

    • millsit