A powerful text sanitization library with support for multiple languages, symbol normalization, and flexible configuration options.
- Built-in dictionaries for multiple languages
- Symbol normalization (e.g.,
@
→ a,$
→s
) - Partial or full word replacement
- Left-to-right or right-to-left replacement direction
- Customizable replacement characters
- Ignore list support
- Full or partial word matching
npm install fod4se
import { LanguageFilter } from "fod4se";
// Create a filter with English dictionary
const filter = new LanguageFilter({ baseLanguage: "en" });
// Clean text
const cleaned = filter.getSafe("Your text here");
You can also use the getSafeText
and analyzeText
functions directly without creating a LanguageFilter
instance.
import { getSafeText } from "fod4se";
const cleaned = getSafeText("Your text here", { baseLanguage: "en" });
console.log(cleaned); // Returns sanitized text
import { analyzeText } from "fod4se";
const result = analyzeText("Text to analyze", { baseLanguage: "en" });
console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "en", // Use built-in English dictionary
});
filter.getSafe("Text to clean"); // Returns sanitized text
The base dictionaries are at an early stage of development and are very incomplete. If you miss something, refer to the contributing section.
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "none",
config: {
profanity: ["word1", "word2"],
ignore: ["goodword1", "goodword2"],
},
});
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({ baseLanguage: "en" });
const result = filter.analyze("Text to analyze");
console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details
import { LanguageFilter, regexTemplate } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "en",
config: {
replaceString: "#@", // Pattern used in replacement (What is this #@#@#)
replaceRatio: 0.5, // Replace 50% of matched words
replaceDirection: "LTR", // Replace from left to right
matchTemplate: regexTemplate.partialMatch, // Match partial words
ignoreSymbols: true, // Don't normalize symbols
},
});
Option | Type | Default | Description |
---|---|---|---|
baseLanguage | "none" | "en" | "pt-br" | - | Built-in dictionary to use |
config | FSConfig | - | Configuration object |
Option | Type | Default | Description |
---|---|---|---|
profanity | string[] | [] | Custom list of words to filter |
ignore | string[] | [] | Words to exclude from filtering |
replaceString | string | "*" | Character(s) used for replacement |
replaceRatio | number | 1 | Portion of word to replace (0 to 1) |
replaceDirection | "LTR" | "RTL" | "RTL" | Direction of partial replacement |
matchTemplate | string | regexTemplate.fullWord | Word matching pattern |
ignoreSymbols | boolean | false | Disable symbol normalization |
import { getSafeText, regexTemplate } from "fod4se";
const text = "c4t category [cat]";
const profanity = ["cat"];
const templates = [
//regexTemplate.fullWord matches "cat" but not "category":
regexTemplate.fullWord,
//regexTemplate.partialMatch matches both "cat" and "category"
regexTemplate.partialMatch,
//custom template to match only [cat]
"\\[{0}\\]",
];
templates
.map((matchTemplate) => getSafeText(text, profanity, { matchTemplate }))
.forEach((result) => console.log(result));
/*
Outputs:
*** category [***] //full
*** ***egory [***] //partial
c4t category ***** //custom
*/
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.