Pdf-figure-extractor
Extract figure from pdf without text in it
Required Packages
Install dependencies:
sudo apt-get install libopencv-dev libcv-dev libtesseract-dev tesseract-ocr
Installation
Install project dependencies:
npm install
Run
If you want to execute in command line interface:
npm install -g pdf-figure-extractor
Usage:
Usage: pdf-figure-extractor [options]Options:-h, --help output usage information-V, --version output the version number-o, --output <path> Directory to put results-i, --input <path> Directory to process-t, --tmp <path> Directory to put temporary files-p, --partials <path> Directory to put figure directory
For instance:
pdf-figure-extractor --input "pdf" --output "output"
If you want to execute as a module:
const pfe =const config =pdfInputPath: inputdirectoryOutputPath: outputdirectoryPartialPath: partialstmp: tmpdebug:trueconfig
TODO
Extract array- Extract graphs (partial: heritage from array when graph have grid inside)
- Extract images