@shelf/text-normalizer
TypeScript icon, indicating that this package has built-in type declarations

1.1.0 • Public • Published

text-normalizer CircleCI

Originally took from openai/whisperer and rewrote to TS

TypeScript library for normalizing English text. It provides a utility class EnglishTextNormalizer with methods for normalizing various types of text, such as contractions, abbreviations, and spacing. EnglishTextNormalizer consists of other classes you can reuse independently:

  • EnglishSpellingNormalizer - uses a dictionary of English words and their American spelling. The dictionary is stored in a JSON file named english.json
  • EnglishNumberNormalizer - works specifically to normalize text from English words to actually numbers
  • BasicTextNormalizer - provides methods for removing special characters and diacritics from text, as well as splitting words into separate letters.

Install

$ yarn add @shelf/text-normalizer

Usage

import {EnglishTextNormalizer} from '@shelf/text-normalizer'

const normalizer = new EnglishTextNormalizer()

console.log(normalizer.normalize("Let's")); // Output: let us
console.log(normalizer.normalize("he's like")); // Output: he is like
console.log(normalizer.normalize("she's been like")); // Output: she has been like
console.log(normalizer.normalize('10km')); // Output: 10 km
console.log(normalizer.normalize('10mm')); // Output: 10 mm
console.log(normalizer.normalize('RC232')); // Output: rc 232
console.log(
  normalizer.normalize('Mr. Park visited Assoc. Prof. Kim Jr.')
); // Output: mister park visited associate professor kim junior

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags

License

MIT © Shelf

/@shelf/text-normalizer/

    Package Sidebar

    Install

    npm i @shelf/text-normalizer

    Weekly Downloads

    438

    Version

    1.1.0

    License

    MIT

    Unpacked Size

    83 kB

    Total Files

    21

    Last publish

    Collaborators

    • ksenia_holovko
    • petro.bodnarchuk
    • kateryna-kochina
    • maksym.tarnavskyi
    • andrii-nastenko
    • mykhailo.yatsko
    • ahavrysh
    • nikita_shelf
    • maciej.orlowski
    • monopotan
    • andrew214
    • bogdan.kolesnyk
    • andrii.batutin
    • kristina.zhak
    • anton-russo
    • mmazurowski
    • toms-shelf
    • mateuszgajdashelf
    • kchlon
    • dmytro.harazdovskiy
    • duch0416
    • i5adovyi
    • olesiamuller
    • mykola.khytra
    • yuliiakovalchuk
    • el_scrambone
    • bodyaflesh
    • slavammellnikov
    • andriisermiahin
    • mpushkin
    • batovpavlo
    • domovoj
    • vozemer
    • oleksii.dymnich
    • dima-bond
    • maksym.hayovets
    • oles.zadorozhnyy
    • ss1l
    • gemshelf
    • hartzler
    • vladgolubev
    • hmelenok
    • knupman
    • maaraanas
    • terret
    • chapelskyi.slavik
    • pihorb
    • irynah
    • diana.kryskuv
    • andy.raven
    • rafler
    • sskalp88
    • demiansua
    • yuriil
    • ktv18
    • drews_abuse
    • rostyslav-horytskyi
    • whodeen