View Source Tokenizers (Tokenizers v0.5.0)

Elixir bindings to Hugging Face Tokenizers.

Hugging Face describes the Tokenizers library as:

Fast State-of-the-art tokenizers, optimized for both research and production

🤗 Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in 🤗 Transformers.

A tokenizer is effectively a pipeline of transformations that take a text input and return an encoded version of that text (Tokenizers.Encoding.t/0).

The main entrypoint to this library is the Tokenizers.Tokenizer module, which defines the Tokenizers.Tokenizer.t/0 struct, a container holding the constituent parts of the pipeline. Most functionality is in that module.