Nemo WebMiner

Nemo-webminer is a Node.js toolkit for scraping content from any website and exporting it as clean, structured JSON or Excel files. Includes an interactive demo UI.

Features

Export scraped data to .xlsx Excel files or JSON.
Use CSS selectors or group results by HTML tag name.
Promise-based, modular API: nemoMine(url, selectors, options)
Interactive demo web page for testing and visualization.

Demo Link:

https://adcloud.space/webminer/

Installation

npm install nemo-webminer
npm install playwright xlsx   # Peer dependencies
npx playwright install       # Install Playwright browsers

Usage

Basic Example

// ESM
import { nemoMine } from 'nemo-webminer';
// or CommonJS
const { nemoMine } = require('nemo-webminer');

// Using object selectors
nemoMine({
  url: 'https://example.com',
  selectors: { Title: 'h1', Links: 'a' },
  format: 'json'
}).then(data => console.log(data));

// Using string selectors
nemoMine({
  url: 'https://example.com',
  selectors: 'Title: h1, Links: a',
  format: 'json'
}).then(data => console.log(data));

// No selectors (groups by tag name)
nemoMine({ url: 'https://example.com', format: 'json' })
  .then(data => console.log(data));
  output: 'result_tags.xlsx'
}).then(filepath => {
  console.log('Saved file grouped by tags to:', filepath);
});

Step-by-Step API Guide

This section provides a detailed walkthrough of how to use the Nemo WebMiner API in different scenarios.

Basic Setup

Install the module in your project:
```
npm install nemo-webminer
```

Import the module:

const { nemoMine } = require('nemo-webminer');
// Or if using directly:
const { nemoMine } = require('./path/to/nemo-webminer/dist/index.js');

Ensure Playwright browsers are installed:
```
npx playwright install
```

Scenario 1: Scraping with Specific Selectors to JSON

Define target URL and selectors:

const url = 'https://example.com';
const selectors = {
  'Title': 'h1',                   // Get main headings
  'Paragraphs': 'p',               // Get paragraphs
  'Links': 'a',                    // Get links
  'Images': 'img',                 // Get images (will include src and alt attributes)
  'ListItems': 'ul > li, ol > li'  // Get list items from unordered and ordered lists
};

Call nemoMine with JSON output format:

nemoMine({
  url,
  selectors,
  format: 'json'
})
.then(data => {
  console.log('Scraped data:', data);
  
})
.catch(error => {
  console.error('Scraping failed:', error.message);
});

The resulting data structure will be:

{
  "Title": ["Example Domain", "Another Heading", ...],
  "Paragraphs": ["This domain is for use in examples...", ...],
  "Links": ["More information...", ...],
  "Images": ["Image description , ...],
  "ListItems": ["List item 1", "List item 2", ...]
}

Scenario 2: Scraping All Tags to Excel File

Define target URL (no selectors necessary):

const url = 'https://example.com';
const outputFile = 'all_tags_data.xlsx';

Call nemoMine with file output format:

nemoMine({
  url,
  // No selectors provided will scrape all tags
  output: outputFile,
  format: 'file' // Default,
})
.then(filePath => {
  console.log(`Excel file saved to: ${filePath}`);
  //  the Excel file for further analysis
})
.catch(error => {
  console.error('Scraping failed:', error.message);
});

Scenario 3: Using String-Format Selectors

Define your target URL and selectors as a string:

const url = 'https://example.com';
// Format: "Name1: selector1, Name2: selector2"
const selectors = 'Headings: h1, h2, h3, Content: p, .content, Article: article';

Call nemoMine with your preferred output format:

nemoMine({
  url,
  selectors,
  format: 'json' // Or 'file' with an output filename
})
.then(result => {
  console.log('Result:', result);
})
.catch(error => {
  console.error('Error:', error.message);
});

Scenario 4: Integration with Express API

Set up an Express route to handle scraping requests:

const express = require('express');
const { nemoMine } = require('nemo-webminer');
const app = express();

app.use(express.json());

app.post('/api/scrape', async (req, res) => {
  try {
    const { url, selectors, format = 'json' } = req.body;
    
    if (!url) {
      return res.status(400).json({ error: 'URL is required' });
    }
    
    const result = await nemoMine({ url, selectors, format });
    
    if (format === 'json') {
      // Return JSON data
      return res.json(result);
    } else {
      // For file output, save the file
      return res.json({ filePath: result });
    }
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.listen(3005, () => {
  console.log('API server running on port 3004');
});

Demo Web UI

Start the demo server:

npm run demo

Open your browser at http://localhost:3004
Enter a URL and optionally enter selectors in one of these formats:
- Comma-separated: h1, a, p
- Comma-separated with names: Title: h1, Links: a
- Or leave blank to scrape all HTML tags
Click "Download XLSX" to get an Excel file or "Get JSON Data" to see the JSON output directly in the browser.

The demo interface provides immediate feedback and allowing to experiment.

API Notes

JSON Output Structure:
- With selectors: { "SelectorName": ["value1", "value2", ...] }
- Without selectors: { "tagName": ["value1", "value2", ...] }

Selector Formats Explained

Nemo-webminer supports multiple ways to specify which elements to scrape.Below are simple explanations of each format:

1. Object Format

// Define selectors as an object
const selectors = {
  "Title": "h1",                  // Key is the column name, value is the CSS selector
  "Content": "p",
  "Links": "a",
  "ImportantText": ".highlight"    // Can use any valid CSS selector
};

// Pass to nemoMine
nemoMine({ url: 'https://example.com', selectors, format: 'json' });

2. String Format with Names

// Define selectors as a string with name:selector pairs
const selectors = "Title: h1, Content: p, Links: a, ImportantText: .highlight";

// Pass to nemoMine
nemoMine({ url: 'https://example.com', selectors, format: 'json' });

3. Simple String Format

// Just list the selectors (column names will be the same as selectors)
const selectors = "h1, p, a, .highlight";

// This is equivalent to:
// { "h1": "h1", "p": "p", "a": "a", ".highlight": ".highlight" }

nemoMine({ url: 'https://example.com', selectors, format: 'json' });

4. No Selectors (All Tags)

// Remove the selectors parameter or pass an empty string
nemoMine({ url: 'https://example.com', format: 'json' });

// Results will be grouped by HTML tag name
// { "h1": [...], "p": [...], "a": [...], "div": [...], ... }

License

MIT

nemo-webminer

Nemo WebMiner

Features

Demo Link:

Installation

Usage

Basic Example

Step-by-Step API Guide

Basic Setup

Scenario 1: Scraping with Specific Selectors to JSON

Scenario 2: Scraping All Tags to Excel File

Scenario 3: Using String-Format Selectors

Scenario 4: Integration with Express API

Demo Web UI

API Notes

Selector Formats Explained

1. Object Format

2. String Format with Names

3. Simple String Format

4. No Selectors (All Tags)

License

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

nemo-webminer

Nemo WebMiner

Features

Demo Link:

Installation

Usage

Basic Example

Step-by-Step API Guide

Basic Setup

Scenario 1: Scraping with Specific Selectors to JSON

Scenario 2: Scraping All Tags to Excel File

Scenario 3: Using String-Format Selectors

Scenario 4: Integration with Express API

Demo Web UI

API Notes

Selector Formats Explained

1. Object Format

2. String Format with Names

3. Simple String Format

4. No Selectors (All Tags)

License

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads