sitemap2doc

0.0.4 • Public • Published

CircleCI

Sitemap 2 Doc

This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.

Designed for AI Embedding Generation

Quickstart

Terminal

npm init -y && npm i sitemap2doc

Node index.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )

Terminal

node index.mjs

Table of Contents

Methods

getDocument()

Key Type Description Required Default
projectName String Set project name true
sitemapUrl String Set sitemap source true
silent Boolean Control terminal output false false

Example

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )
  Get Sitemap     https://...
  Get Pages       0 1 2 3 4 5 6 7 8 9  
  Merge           0 

getConfig()

Get current config, the default config you can find here: ./src/data/config.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

setConfig()

All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

License

The module is available as open source under the terms of the Apache 2.0. License.

Readme

Keywords

Package Sidebar

Install

npm i sitemap2doc

Weekly Downloads

2

Version

0.0.4

License

Apache 2.0.

Unpacked Size

27.6 kB

Total Files

10

Last publish

Collaborators

  • a6b8