stopwords-iso

1.1.0 • Public • Published

stopwords-iso

Build Status devDependencies Status

The most comprehensive collection of stopwords for multiple languages.

The collection follows the ISO 639-1 language code.

If you only need stopwords for a specific language, there is a separate collection for each.

Usage

The collection is in JSON format. You are free to use this collection any way you like. It is only currently published on npm, bower, and pip.

Node/JavaScript

$ npm install stopwords-iso
$ bower install stopwords-iso
// Node
const stopwords = require('stopwords-iso');  // object of stopwords for multiple languages
const english = stopwords.en;  // English stopwords

Contributing

If you wish to remove or update some of the stopwords, please file an issue first before sending a PR on the repo of the specific language.

If you would like to add a stopword or a new set of stopwords, please add them as a new text file on the repo of the corresponding language.

Credits

All stopwords sources are listed here.

List of Included Languages

This table lists the entire set of ISO 639-1:2002 codes, with a check mark indicating those language codes that are found in stopwords-iso.json.

The list of codes itself is from www.loc.gov, which is the official "language codes list" and is linked to from www.iso.org.

ISO 639-1 Code Language Included Here
aa Afar
ab Abkhazian
af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
an Aragonese
hy Armenian
as Assamese
av Avaric
ae Avestan
ay Aymara
az Azerbaijani
ba Bashkir
bm Bambara
eu Basque
be Belarusian
bn Bengali
bh Bihari languages
bi Bislama
bo Tibetan
bs Bosnian
br Breton
bg Bulgarian
my Burmese
ca Catalan; Valencian
cs Czech
ch Chamorro
ce Chechen
zh Chinese
cu Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic
cv Chuvash
kw Cornish
co Corsican
cr Cree
cy Welsh
da Danish
de German
dv Divehi; Dhivehi; Maldivian
nl Dutch; Flemish
dz Dzongkha
el Greek, Modern (1453-)
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
fa Persian
fj Fijian
fi Finnish
fr French
fy Western Frisian
ff Fulah
ka Georgian
gd Gaelic; Scottish Gaelic
ga Irish
gl Galician
gv Manx
gn Guarani
gu Gujarati
ht Haitian; Haitian Creole
ha Hausa
he Hebrew
hz Herero
hi Hindi
ho Hiri Motu
hr Croatian
hu Hungarian
ig Igbo
is Icelandic
io Ido
ii Sichuan Yi; Nuosu
iu Inuktitut
ie Interlingue; Occidental
ia Interlingua (International Auxiliary Language Association)
id Indonesian
ik Inupiaq
it Italian
jv Javanese
ja Japanese
kl Kalaallisut; Greenlandic
kn Kannada
ks Kashmiri
kr Kanuri
kk Kazakh
km Central Khmer
ki Kikuyu; Gikuyu
rw Kinyarwanda
ky Kirghiz; Kyrgyz
kv Komi
kg Kongo
ko Korean
kj Kuanyama; Kwanyama
ku Kurdish
lo Lao
la Latin
lv Latvian
li Limburgan; Limburger; Limburgish
ln Lingala
lt Lithuanian
lb Luxembourgish; Letzeburgesch
lu Luba-Katanga
lg Ganda
mk Macedonian
mh Marshallese
ml Malayalam
mi Maori
mr Marathi
ms Malay
mg Malagasy
mt Maltese
mn Mongolian
na Nauru
nv Navajo; Navaho
nr Ndebele, South; South Ndebele
nd Ndebele, North; North Ndebele
ng Ndonga
ne Nepali
nn Norwegian Nynorsk; Nynorsk, Norwegian
nb Bokmål, Norwegian; Norwegian Bokmål
no Norwegian
ny Chichewa; Chewa; Nyanja
oc Occitan (post 1500)
oj Ojibwa
or Oriya
om Oromo
os Ossetian; Ossetic
pa Panjabi; Punjabi
pi Pali
pl Polish
pt Portuguese
ps Pushto; Pashto
qu Quechua
rm Romansh
ro Romanian; Moldavian; Moldovan
rn Rundi
ru Russian
sg Sango
sa Sanskrit
si Sinhala; Sinhalese
sk Slovak
sl Slovenian
se Northern Sami
sm Samoan
sn Shona
sd Sindhi
so Somali
st Sotho, Southern
es Spanish; Castilian
sc Sardinian
sr Serbian
ss Swati
su Sundanese
sw Swahili
sv Swedish
ty Tahitian
ta Tamil
tt Tatar
te Telugu
tg Tajik
tl Tagalog
th Thai
ti Tigrinya
to Tonga (Tonga Islands)
tn Tswana
ts Tsonga
tk Turkmen
tr Turkish
tw Twi
ug Uighur; Uyghur
uk Ukrainian
ur Urdu
uz Uzbek
ve Venda
vi Vietnamese
vo Volapük
wa Walloon
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
za Zhuang; Chuang
zu Zulu

Readme

Keywords

Package Sidebar

Install

npm i stopwords-iso

Weekly Downloads

133,204

Version

1.1.0

License

MIT

Unpacked Size

217 kB

Total Files

5

Last publish

Collaborators

  • genediazjr