Search results

221 packages found

Generic web crawler powered by Node.js

published version 1.4.1, 8 years ago5 dependents licensed under $BSD-2-Clause
280

Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.

published version 0.5.6, 7 years ago2 dependents licensed under $MIT
215

A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)

published version 1.0.3, 8 years ago1 dependents licensed under $MIT
219

A library for scraping manga from various websites.

published version 1.1.2, 2 years ago0 dependents licensed under $MIT
200

爬虫工具,根据传入的配置和规则爬取数据

published version 1.1.4-0, 5 years ago0 dependents licensed under $ISC
200

Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.

published version 0.1.1, 8 years ago4 dependents licensed under $BSD-2-Clause
192

Spy on files

published version 1.2.4, 4 years ago1 dependents licensed under $MIT
189

Website extensions for the Sajari API. Automatically index site content, add user profiles, render search and recommendations, etc.

published version 0.12.0, 2 years ago1 dependents licensed under $MIT
178

JSpider 3 is a Chrome DevTools crawler framework that includes full crawler support. JSpider 3 是在 Chrome Devtools 中进行爬虫的爬虫框架, 这个框架包括了完整的爬虫支持。

published version 3.2.3, 3 years ago0 dependents licensed under $Apache-2.0
180

Super configurable async web spider

published version 0.3.0, 9 years ago0 dependents licensed under $MIT
155

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

published version 1.2.66, a year ago0 dependents licensed under $MIT
151

Crawl a site to generate a backstopjs config

published version 2.3.1, 7 years ago0 dependents licensed under $MIT
139

HTTP library specifically designed for crawling the web. Built-in caching and per-domain queueing

published version 0.7.7, 5 years ago3 dependents licensed under $ISC
130

Fetch special for spider.

published version 1.0.13, a year ago0 dependents licensed under $MIT
144

A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

published version 2.0.2, 9 years ago0 dependents licensed under $MIT
111

Promise based parser for robots.txt files.

published version 3.2.0, 5 years ago0 dependents licensed under $MIT
112

Walk the nodes in jsonschema.

published version 1.0.6, 6 years ago1 dependents licensed under $MIT
109

Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

published version 2.0.2, 6 years ago0 dependents licensed under $MIT
106

A Model Context Protocol (MCP) server that provides search and crawl functionality using Search1API

published version 0.1.3, 2 months ago0 dependents licensed under $MIT
98

A simple file search utility.

published version 0.0.133, 9 years ago0 dependents licensed under $MIT
100