tokio

0.1.2 • Public • Published

tokio

NPM version NPM downloads CircleCI donate chat

Web scraping made simple.

Features

  • Built on the top of jsdom.
  • It runs inline and external scripts on the page.
  • You can add resource filter to not load certain external resources.
  • Simple and fast, only 100 SLOC and it does not require Electron or Chromium.

Install

yarn add tokio

Table of Contents

Usage

const Tokio = require('tokio')
 
const tokio = new Tokio({
  url: 'https://some-website.com'
})
 
tokio.fetch().then(html => {
  console.log(html) //=> string
 
  // Query HTML with cheerio (server-side jQuery)
  // https://github.com/cheeriojs/cheerio
  const $ = tokio.query(html)
})

API

new Tokio(options)

options

options.url
  • Type: string
  • Required: required

The URL to fetch.

options.wait
  • Type: number string
  • Default: 50

Wait for certain time (in milliseconds) or dom element to show up.

options.manually
  • Type: boolean string

Instead of using options.wait, you can manually call window.__tokio_ready__() in your website to tell us that it's ready to be captured.

It can also be a string like i_am_ready so that you can call window.i_am_ready() instead.

options.resourceFilter
  • Type: resource => boolean

Whether to load certain resource. Check out the resource type.

options.requestOptions
  • proxy: string A URL for a HTTP proxy to use for the requests.
  • agent: http(s).Agent instance to use.
  • agentOptions: The agent options; defaults to { keepAlive: true, keepAliveMsecs: 115000 }, see http api for more details.
  • strictSSL: If true, requires SSL certificates be valid; defaults to true, see request module for more details.
  • userAgent: The user agent string used in requests; defaults to Node.js (#process.platform#; U; rv:#process.version#)
  • headers: An object giving any headers that will be used while loading the HTML from options.url, if applicable.

tokio.fetch()

  • Type: () => Promise<string>

Fetch URL and return corresponding HTML. (JavaScript on this page will be evaluated.)

tokio.query(html, opts)

This is basically cheerio.load(html, opts).

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Author

tokio © egoist, Released under the MIT License.
Authored and maintained by egoist with help from contributors (list).

github.com/egoist · GitHub @egoist · Twitter @_egoistlily

Readme

Keywords

none

Package Sidebar

Install

npm i tokio

Weekly Downloads

2

Version

0.1.2

License

MIT

Unpacked Size

8.99 kB

Total Files

5

Last publish

Collaborators

  • egoist