wiktionary-scraper
TypeScript icon, indicating that this package has built-in type declarations

0.0.1-patch.3 • Public • Published

A lightweight scraper to fetch information about words in various languages from Wiktionary.

Table of contents

Usage

To start using the scraper, first install it using the following command:

npm install wiktionary-scraper

The simplest way of using the scraper is as follows:

import * as Wiktionary from "wiktionary-scraper";

const results = await Wiktionary.get("word");

You can change the language of the target word by setting the lemmaLanguage:

import * as Wiktionary from "wiktionary-scraper";

const results = await Wiktionary.get('o', {
  lemmaLanguage: "Romanian",
});

You can specify if redirects should be followed by setting followRedirects to true:

import * as Wiktionary from "wiktionary-scraper";

// Redirects to and returns results for "Germany".
const results = await Wiktionary.get('germany', {
  followRedirects: true,
});

By default, the User-Agent header used in requests is filled in using a default value mentioning wiktionary-scraper.

To remove it, set userAgent to undefined.

If you want to change it, specify userAgent:

import * as Wiktionary from "wiktionary-scraper";

const results = await Wiktionary.get('word', {
  userAgent: "Your App (https://example.com)",
});

You can also parse HTML of the website directly, bypassing the fetch step.

ℹ️ Notice that, as opposed to get(), parse() is synchronous:

import * as Wiktionary from "wiktionary-scraper";

const results = Wiktionary.parse(html);

Completeness

This library currently only supports the English version of Wiktionary.

Features

  • Parses both single- and multiple-etymology entries.
  • Recognises standard, non-standard and some explicitly disallowed parts of speech, as defined here. In total, there are 60+ recognised parts of speech, which should cover the vast majority of definitions.
    • Note, however, that it is very possible that the library will fail to recognise certain niche, non-standard parts of speech. Should you come across any, please post an issue.

Section support

  • [ ] Description
  • [ ] Glyph origin
  • [x] Etymology
  • [ ] Pronunciation
  • [ ] Production
  • [x] Definitions
  • [ ] Usage notes
  • [ ] Reconstruction notes
  • [ ] Inflection sections:
    • [ ] Inflection
    • [ ] Conjugation
    • [ ] Declension
  • [ ] Mutation
  • [ ] Quotations
  • [ ] Alternative forms
  • [ ] Alternative reconstructions
  • [ ] Relations:
    • [ ] Synonyms
    • [ ] Antonyms
    • [ ] Hypernyms
    • [ ] Hyponyms
    • [ ] Meronyms
    • [ ] Holonyms
    • [ ] Comeronyms
    • [ ] Troponyms
    • [ ] Parasynonyms
    • [ ] Coordinate terms
    • [ ] Derived terms
    • [ ] Related terms
  • [ ] Translations
  • [ ] Trivia
  • [ ] See also
  • [ ] References
  • [ ] Further reading
  • [ ] Anagrams
  • [ ] Examples

Recognised parts of speech

Parts of speech
  • Adjective
  • Adverb
  • Ambiposition
  • Article
  • Circumposition
  • Classifier
  • Conjunction
  • Contraction
  • Counter
  • Determiner
  • Ideophone
  • Interjection
  • Noun
  • Numeral
  • Participle
  • Particle
  • Postposition
  • Preposition
  • Pronoun
  • Proper noun
  • Verb
Morphemes
  • Circumfix
  • Combining form
  • Infix
  • Interfix
  • Prefix
  • Root
  • Suffix
Symbols
  • Diacritical mark
  • Letter
  • Ligature
  • Number
  • Punctuation mark
  • Syllable
  • Symbol
Phrases
  • Phrase
  • Proverb
  • Prepositional phrase
Han characters and language-specific varieties
  • Han character
  • Hanzi
  • Kanji
  • Hanja
Other
  • Romanization
  • Logogram
  • Determinative
Explicitly disallowed parts of speech

You know, just in case somebody didn't follow the rules on Wiktionary.

  • Abbreviation
  • Acronym
  • Initialism
  • Cardinal-number
  • Ordinal-number
  • Cardinal-numeral
  • Ordinal-numeral
  • Clitic
  • Gerund
  • Idiom
Library additions
  • Adposition
  • Affix
  • Character

Package Sidebar

Install

npm i wiktionary-scraper

Weekly Downloads

5

Version

0.0.1-patch.3

License

MIT

Unpacked Size

47.9 kB

Total Files

79

Last publish

Collaborators

  • vxern