arm-pdf-scrape

npm module to scrape the assembly instructions from an ARM manual.

This is intended for ARMv7-M Architecture Reference Manual. You must provide an existing copy of the manual yourself, this is just a scraper.

Install

npm install --save arm-pdf-scrape

Usage

const {loadPdfFromPath, generateInstructions, instructionToText} = require("arm-pdf-scrape")

const filepath = "/path/to/manual.pdf";
loadPdfFromPath(filepath)
  .then(manual => generateInstructions(manual))
  .then(instructions => {
    instructions.forEach(i => console.log(instructionToText(i)))
  })
  .catch(e => console.error(`Something went wrong: ${e}`))

Fluff

Scraping is imprecise, so we use expected values to guide it. E.g.,

The beginning of entries have A7.7.[0-9]+ near the start of the page text.
The syntax follows "Assembler syntax" in bold font.
There will be "Encoding 1", etc., in bold font.

Steps:

Get text chunks of each page
Strip the runners (headers and footers)
Sort chunks and combine same-line items when possible
Extract regions of section-body
Merge all regions into one array
Separate regions into instructions

TODO:

Nested bullets in SSBB, PSSBB
Math in QADD
Spacing of bold, italic, verbatim

arm-pdf-scrape

arm-pdf-scrape

Install

Usage

Fluff

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

arm-pdf-scrape

arm-pdf-scrape

Install

Usage

Fluff

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads