fetchfox

0.0.38 • Public • Published

FetchFox

Screenshot 2024-10-13 at 1 14 28 AM

FetchFox is an AI powered scraping, automation, and data extraction library.

It can scrape data from any webpage using just plain English. It is made by the developers of the FetchFox AI scraper.

GitHub stars npm version

Getting started

Install the package and playwright:

npm i fetchfox
npx playwright install-deps
npx playwright install

Then use it. Here is the callback style:

import { fox } from 'fetchfox';

const workflow = await fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .plan();

const results = workflow
  .run(null, (delta) => { console.log(delta.item) });
  
for (const result of results) {
  console.log('Item:', result.item);
}

If you prefer, you can use the streaming style:

import { fox } from 'fetchfox';

const stream = fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .stream();

for await (const delta of stream) {
  console.log(delta.item);
}

Following URLs

You'll often want to scrape over multiple levels. You can do this using the url field. If you extract a url field, FetchFox will follow that URL on the next step.

For example, you can get HP and attack on the second page of the Pokedex:

const workflow = await fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ 
    url: 'URL of pokemon profile', 
    name: 'Pokemon name', 
    number: 'Pokemon number'
  })
  .extract({ 
    hp: 'Pokemon HP', 
    attack: 'Pokemon attack power', 
  })
  .limit(3)
  .plan();

const results = workflow
  .run(null, (delta) => { console.log(delta.item) });
  
for (const result of results) {
  console.log('Item:', result.item);
}

This scraper will start at https://pokemondb.net/pokedex/national, and then go to detail pages like https://pokemondb.net/pokedex/pikachu to get the HP and attack values.

Enter your API key

You'll need to give an API key for the AI provider you are using, such as OpenAI. There are a few ways to do this.

The easiest option is to set the OPENAI_API_KEY environment variable. This will get picked up by the FetchFox library, and all AI calls will go through that key. To use this option, run your code like this:

OPENAI_API_KEY=sk-your-key node index.js

Alternatively, you can pass in your API key in code, like this:

import { fox } from 'fetchfox';

const results = await fox
  .config({ ai: { model: 'openai:gpt-4o-mini', apiKey: 'sk-your-key' }})
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .run();

This will use OpenAI's gpt-4o-mini model, and the API key you specify. You can also use OpenRouter to access AI models from other providers:

const results = await fox
  .config({ ai: { model: 'openrouter:google/gemini-flash-1.5', apiKey: 'your-openrouter-key' }})
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .run();

Choose the AI model that best suits your needs.

The following providers are supported

  • OpenAI: Model strings are openai:..., for example openai:gpt-4o
  • Google: Model strings are google:..., for example google:gemini-1.5-flash
  • OpenRouter: Model strings are openrouter:..., for example openrouter:anthropic/claude-3.5-haiku

By default, FetchFox uses OpenAI's gpt-4o-mini model. We've found this model to provide a good tradeoff between cost, runtime, and accuracy. We have a public benchmarks dashboard where you can review performance data on recent commits.

Readme

Keywords

none

Package Sidebar

Install

npm i fetchfox

Weekly Downloads

83

Version

0.0.38

License

MIT

Unpacked Size

166 MB

Total Files

1039

Last publish

Collaborators

  • ortutay