@axeptio/links-classifier

0.0.6 • Public • Published

Links Classifier

Use Case

We want to filter links from a given webpage and classify them into different document types, like Privacy Policy, Terms of Service, etc.

Approach

We expose two functions, one for filtering the links, removing external, invalid and duplicate links, and another one for classifying the links into different document types.

Usage

const { filterLinks, classifyLinks, keywords } = require('links-classifier');

const links = document.querySelectorAll('a');

const filteredLinks = filterLinks(
  links, // the links to filter
  window.location, // the context
  ['en', 'fr', 'it'], // valid locales (other languages will be ignored)
  false, // follow subdomains
  console.log // logger function
);

const classifiedLinks = classifyLinks(filteredLinks, keywords, 'fr');

console.log(classifiedLinks);

/*
{
 'privacy_policy': Array(2),
 'terms_of_service': Array(1),
}
 */

Data

This module imports its own dataset, located in data/keywords.js, which contains variations for each document type. It is exposed as a symbol from the index, but you are free to use your own dataset.

Readme

Keywords

none

Package Sidebar

Install

npm i @axeptio/links-classifier

Weekly Downloads

41

Version

0.0.6

License

ISC

Unpacked Size

530 kB

Total Files

19

Last publish

Collaborators

  • nikolas.olivier
  • a_ng_d_axeptio
  • axeptiotech
  • mcriel-axeptio
  • achalhii
  • rombat
  • romainbessuges