@lolojs/htmlindexer

1.1.0 • Public • Published

This is a library for indexing a document or extracting unique non stopwords tokens and getting their frequency

For indexing call the function IndexDocument and listen for the finish event when indexing completed and also you can access extracted token using the tokens property and it is a Map data structure

const HtmlIndexer =require('./htmlIndexer'); const indexer = new HtmlIndexer();

indexer.IndexDocument("tests/test.html");
    indexer.on("indexFinished", () => {
        for (var key of indexer.tokens.keys()) {
            console.log(`Term : ${key}    Frequency : ${indexer.tokens.get(key)}`);
        }
    });

You can access generated tokens with using stream with getOutPutStream passing chunk size or number of tokens

per chunk and the output is json based with format { term: 'test', freq: 1, isFirstChunk: true, isLastChunk: true }

var stream =indexer.getOutPutStream(2);
        stream.on('data',(data)=>console.log(data));

Readme

Keywords

none

Package Sidebar

Install

npm i @lolojs/htmlindexer

Weekly Downloads

0

Version

1.1.0

License

ISC

Unpacked Size

208 kB

Total Files

14

Last publish

Collaborators

  • lolojs