View Source Tokenizers (Tokenizers v0.4.0)
Elixir bindings to Hugging Face Tokenizers.
Hugging Face describes the Tokenizers library as:
Fast State-of-the-art tokenizers, optimized for both research and production
🤗 Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in 🤗 Transformers.
A tokenizer is effectively a pipeline of transformations that take
a text input and return an encoded version of that text (Tokenizers.Encoding.t/0
).
The main entrypoint to this library is the Tokenizers.Tokenizer
module, which defines the Tokenizers.Tokenizer.t/0
struct, a
container holding the constituent parts of the pipeline. Most
functionality is in that module.