@xapp/arachne-cli

A command line crawler based on puppeteer

To crawl a site and save the pages to a local ./temp directory

$ arachne crawl http://www.thecoffeefaq.com/ -d ./temp

To also save markdown and schema.org FAQs

$ arachne  crawl http://www.thecoffeefaq.com/ -a -t markdown -d ./temp

With a whitelisted patterns file

$ arachne  crawl http://www.thecoffeefaq.com/ -a -t markdown -d ./temp -w ./temp/whitelist.md

With a settling period

$ arachne crawl http://www.thecoffeefaq.com/ -d ./temp -b 5000 -o 9000

You will need to start XLaunch before running the CLI, select multiple windows, no client, turn off access control.

Another option is to add -h flag to run headless (no browser application launched).

If the normal commands don't work, you might need to pass in the executablePath (-e) and run headless (-h).

$ arachne crawl http://www.thecoffeefaq.com/ -e  /usr/bin/google-chrome -h

Readme