View Source Tokenizers (Tokenizers v0.3.2)
Elixir bindings to Hugging Face Tokenizers.
Hugging Face describes the Tokenizers library as:
Fast State-of-the-art tokenizers, optimized for both research and production
🤗 Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in 🤗 Transformers.
This library has bindings to use pretrained tokenizers. Support for building and training a tokenizer from scratch is forthcoming.
A tokenizer is effectively a pipeline of transforms to take some input text and return a
Tokenizers.Encoding.t()
. The main entrypoint to this library is the Tokenizers.Tokenizer
module, which holds the Tokenizers.Tokenizer.t()
struct, a container holding the constituent
parts of the pipeline. Most functionality is there.