downode

0.1.6 • Public • Published

downode

NPM version Node Version CircleCI Coverage Status

downode is a easy-to-use scraper for general usage. Simple but powerful.

Installation

npm i -S downode

Features

  • Composable: downode supports nested Rule, you can reuse/compose your Page Rule / Rule arbitrarily.

  • Concurrent control: Control all the network requests with simple config option.

  • Reference mechanism: You can reference other scraped data easily and asynchronously.

Documentations

Examples

There is a example to scrape Douban Top Rated 250 Movies.

API

downode(entryURL, pageRule, globalOptions?)

scrape the given URL page with given Page Rule

NOTE: if you're using commonjs module, you'll need to use require('downode').default to get this main function

Params

  • string entryURL - The target URL you want to start with.
  • Object pageRule - The Page Rule for the entry page, a set of Rule.
  • Object globalOptions - Global config options.
    • totalConcurrent (number? = 50) - Max concurrent number for global task prority queue. see Concurent Control
    • mode ('default' | 'df' | 'bf') - Global task prority queue mode. see Concurent Control
    • entryCookie (string) - cookie for entry request.
    • rate (number? = 0) - Default rate option for Rules.
    • concurrent (number? = 5) - Default concurrent option for Rules.
    • request (Object? = 0) - Default request option for Rules.
    • userAgents ((string[] | string)? = MOST_COMMON_USER_AGENTS) - Default userAgents option for Rules.
    • retry (number? = 3) - Default retry option for Rules.
    • retryTimeout (number? = 2000) - Default retryTimeout option for Rules.

Return

  • Promise - resolve a result Object with same structure to your Page Rule

waitFor(...refPaths, callback)

Function overloading:

  • waitFor(refPathsArray, callback)
  • waitFor(refPathsObject, callback)

Create a Reference Variable Waiter. Invoke the callback when all Reference Variables are available.

To learn more about reference mechanism, please head to reference-mechanism

Params

  • string[] refPaths: Reference Paths passed one by one.
    • or string[] refPathsArray: A array contains all Reference Paths
    • or object refPathsObject: A object contains key value map to Reference Paths
  • Function callback

Return

  • any - Return what callback return.

Debug

# set environment variable 
export DEBUG=downode:*
 
# `downode:info` - basic infomation, like request, download. 
# `downode:warn` - retry request, useless rule 
# `downode:error` - error infomation, including request error, download error etc. 

Related

downode is inspired by these projects:

Roadmap

  • Proxy Rule Option
  • Post Rule Option
  • Authorization/Cookie propogation
  • CLI support
  • Incremental scrape
  • Dynamic generate website scrape support

License

MIT

Package Sidebar

Install

npm i downode

Weekly Downloads

4

Version

0.1.6

License

MIT

Last publish

Collaborators

  • ceoimon