spider
Get your schemon! Tim Bushell
Scrape Schema.org objects into mongoose schema files the elioWay.
This is a requirement of bones but it can also be run as the boilerplate of a web spidering project with scheming intentions.
Install
npm install @elioWay/spider --save
Usage
// yourapp.js
const yourAppSpider = require('@elioWay/spider');
var today = new Date()
// Create schemon the spider.
let schemon = new yourAppSpider(
version = today.getFullYear() + '.' + today.getMonth() + '.' + today.getDate(), // Do change.
depth = 2, // The deeper you go, the more objects you get. Go crazy.
thingsSelector = '#thing_tree', // Don't change - but there is a bigger tree on the page.
useOjectFields = true // Instead of 1 to 1 relationships to other Things, force String type.
)
// Let schemon do spider things.
schemon.spider(
// Wrap what schemon scraped.
data => Spider.optimize(
data
)
)
node yourapp
Seeing is believing
git clone https://gitlab.com/elioschemers/spider/
cd spider
node test_spider
Credits
- http://sinonjs.org/
- https://github.com/underscopeio/sinon-mongoose
- https://cheerio.js.org/
- https://stackoverflow.com/questions/34368419/web-scraper-iterating-over-pages-with-rx-js
License
MIT Tim Bushell