FetchFox is an AI powered scraping, automation, and data extraction library.
It can scrape data from any webpage using just plain English. It is made by the developers of the FetchFox AI scraper.
Install the package and playwright:
npm i fetchfox
npx playwright install-deps
npx playwright install
Then use it. Here is the callback style:
import { fox } from 'fetchfox';
const workflow = await fox
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.plan();
const results = workflow
.run(null, (delta) => { console.log(delta.item) });
for (const result of results) {
console.log('Item:', result.item);
}
If you prefer, you can use the streaming style:
import { fox } from 'fetchfox';
const stream = fox
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.stream();
for await (const delta of stream) {
console.log(delta.item);
}
You'll often want to scrape over multiple levels. You can do this using the url
field. If you extract a url
field, FetchFox will follow that URL on the next step.
For example, you can get HP and attack on the second page of the Pokedex:
const workflow = await fox
.init('https://pokemondb.net/pokedex/national')
.extract({
url: 'URL of pokemon profile',
name: 'Pokemon name',
number: 'Pokemon number'
})
.extract({
hp: 'Pokemon HP',
attack: 'Pokemon attack power',
})
.limit(3)
.plan();
const results = workflow
.run(null, (delta) => { console.log(delta.item) });
for (const result of results) {
console.log('Item:', result.item);
}
This scraper will start at https://pokemondb.net/pokedex/national, and then go to detail pages like https://pokemondb.net/pokedex/pikachu to get the HP and attack values.
You'll need to give an API key for the AI provider you are using, such as OpenAI. There are a few ways to do this.
The easiest option is to set the OPENAI_API_KEY
environment variable. This will get picked up by the FetchFox library, and all AI calls will go through that key. To use this option, run your code like this:
OPENAI_API_KEY=sk-your-key node index.js
Alternatively, you can pass in your API key in code, like this:
import { fox } from 'fetchfox';
const results = await fox
.config({ ai: { model: 'openai:gpt-4o-mini', apiKey: 'sk-your-key' }})
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.run();
This will use OpenAI's gpt-4o-mini
model, and the API key you specify. You can also use OpenRouter to access AI models from other providers:
const results = await fox
.config({ ai: { model: 'openrouter:google/gemini-flash-1.5', apiKey: 'your-openrouter-key' }})
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.run();
Choose the AI model that best suits your needs.
The following providers are supported
-
OpenAI: Model strings are
openai:...
, for exampleopenai:gpt-4o
-
Google: Model strings are
google:...
, for examplegoogle:gemini-1.5-flash
-
OpenRouter: Model strings are
openrouter:...
, for exampleopenrouter:anthropic/claude-3.5-haiku
By default, FetchFox uses OpenAI's gpt-4o-mini
model. We've found this model to provide a good tradeoff between cost, runtime, and accuracy. We have a public benchmarks dashboard where you can review performance data on recent commits.