This repository provides a lightweight, pure JavaScript implementation of Hugging Face's tokenizers. It is based on the tokenizers available in the transformers.js library. By removing dependencies such as ONNX and others, this library focuses solely on efficient text tokenization, offering a streamlined solution without the overhead of additional dependencies.
This project is ideal for those who require a simple and efficient way to tokenize text data using Hugging Face's tokenizers in JavaScript environments, without the need for heavy or unnecessary components.
You can install the package via npm:
npm install @flexpilot-ai/tokenizers.js
Here is a basic example of how to use the tokenizer:
import { AutoTokenizer } from "@flexpilot-ai/tokenizers.js";
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/bert-base-uncased');
const { input_ids } = await tokenizer('I love tokenizers.js!');
If you encounter any issues, please report them here.
This project is licensed under the Apache-2.0 License. See the LICENSE
file for details.