@xapp/arachne-cli
TypeScript icon, indicating that this package has built-in type declarations

1.8.13 • Public • Published

@xapp/arachne-cli

A command line crawler based on puppeteer

Example Usage

To crawl a site and save the pages to a local ./temp directory

$ arachne crawl http://www.thecoffeefaq.com/ -d ./temp

To also save markdown and schema.org FAQs

$ arachne  crawl http://www.thecoffeefaq.com/ -a -t markdown -d ./temp

With a whitelisted patterns file

$ arachne  crawl http://www.thecoffeefaq.com/ -a -t markdown -d ./temp -w ./temp/whitelist.md

With a settling period

$ arachne crawl http://www.thecoffeefaq.com/ -d ./temp -b 5000 -o 9000

Windows & WSL2 Notes

Follow the instructions here to setup: https://github.com/puppeteer/puppeteer/issues/1837#issuecomment-689006806

You will need to start XLaunch before running the CLI, select multiple windows, no client, turn off access control.

Another option is to add -h flag to run headless (no browser application launched).

If the normal commands don't work, you might need to pass in the executablePath (-e) and run headless (-h).

$ arachne crawl http://www.thecoffeefaq.com/ -e  /usr/bin/google-chrome -h

Licenses

Readme

Keywords

none

Package Sidebar

Install

npm i @xapp/arachne-cli

Weekly Downloads

2

Version

1.8.13

License

Apache-2.0

Unpacked Size

54.5 kB

Total Files

18

Last publish

Collaborators

  • petehaas
  • michaelmyers
  • chrsdietz
  • xappbot
  • opendog