SQLite queue for Simplecrawler
This is an implementation of FetchQueue Interface for simplecrawler queue with SQLite usage as backend.
Preferences: Possibility to pause/stop/kill/terminate running job without queue state losing
Installation
Install from github
npm install git+https://github.com/LeMoussel/SQLite-simplecrawler-queue#master
Install from npm
npm install --save SQLite-simplecrawler-queue
Usage
All you need is the database information such as database file
try const sqliteDatabaseName = 'crawlsite.sqlite3' // Drop Database if exist SQLiteFetchQueue // Connect to a disk file database, you pass the path to the database file. const crawlerQueue = sqliteDatabaseName // Initialization of the database crawlerQueue // Initializing simplecrawler const crawler = 'http://example.com' crawlermaxDepth = 3 crawlerallowInitialDomainChange = false crawlerfilterByDomain = true crawlerqueue = crawlerQueue crawlerstart catch err console
Test
npm test
. Check test folder for extra usages.
Additional utilities
- Drop the queue using
dropQueue
method.
// Connect to a disk file database, you pass the path to the database file.const crawlerQueue = 'sqliteDatabaseName' 'queue'// Drop 'queue' tablecrawlerQueuedropQueue// Initialization of the databasecrawlerQueue
- Drop the database using
SQLiteFetchQueue.dropDatabase
static method.
// Drop Database if existSQLiteFetchQueue// Connect to a disk file database, you pass the path to the database file.const crawlerQueue = 'sqliteDatabaseName' 'queue'// Initialization of the databasecrawlerQueue
- Export the flexible queue system to disk in a JSON file.
// Flexible queue system which can be frozen to diskcrawlerQueue
- Import from a frozen JSON file on disk.
// Flexible queue system which can be defrosted from diskcrawlerQueue
Resources
License
MIT licensed and all it's dependencies are MIT or BSD licensed.