jul11co-sitespider
Jul11Co's Website Spider - Crawling website.
Installation
From npm
npm install -g jul11co-sitespider
Usage
-
Commandline
Usage: sitespider [OPTIONS...] [page_url] <output_dir> --download : Download --resume : Resume --update : Update (and check for incompleted links) --add-link : Add link --fix-links : Fix links --verbose : Verbose --images : Download images (default: false) --scripts : Download scripts (default: false) --stylesheets : Download stylesheets (default: false) --max-depth=X : Specify max depth
-
API
var Spider = ;var spider =config_file: "PATH_TO_CONFIG_FILE" // optionalstate_file: "PATH_TO_STATE_FiLE" // optional;spider;spider;spider;spider;spider;spider;spiderstartstart_link output_dir options;spider;spider; -
Extend with scrapers
Content of scraper script:
// sitespider_scrapers/example.jsmoduleexports =name: 'Example'{// matching rules// return true or false}{// scraping here...}// example-spider.jsvar Spider = ;var spider = ;spider;...
License
Licensed under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)