Gateway for tokenizing and detokenizing text using Hugging Face tokenizers.
This gateway provides encoding and decoding functionality for text, which is useful for:
- Counting tokens to manage context windows
- Understanding token usage for cost estimation
- Debugging token-related issues
The gateway uses the tokenizers library, which provides Rust-based
tokenizers via Rustler NIF bindings for high performance.
Examples
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello, world!")
iex> text = Mojentic.LLM.Gateways.TokenizerGateway.decode(tokenizer, tokens)
iex> text
"Hello, world!"
Summary
Functions
Counts the number of tokens in a text string.
Decodes tokens back into text.
Encodes text into tokens.
Creates a new TokenizerGateway with the specified model.
Creates a new TokenizerGateway with the specified model, raising on error.
Types
@type t() :: %Mojentic.LLM.Gateways.TokenizerGateway{ tokenizer: Tokenizers.Tokenizer.t() }
Functions
@spec count_tokens(t(), String.t()) :: non_neg_integer()
Counts the number of tokens in a text string.
This is a convenience function that encodes the text and returns the token count.
Parameters
gateway- The TokenizerGateway instancetext- The text to count tokens for
Returns
count- The number of tokens
Examples
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
iex> count = Mojentic.LLM.Gateways.TokenizerGateway.count_tokens(tokenizer, "Hello, world!")
iex> count > 0
true
Decodes tokens back into text.
Parameters
gateway- The TokenizerGateway instancetokens- List of token IDs to decode
Returns
text- The decoded text
Examples
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello!")
iex> text = Mojentic.LLM.Gateways.TokenizerGateway.decode(tokenizer, tokens)
iex> text
"Hello!"
Encodes text into tokens.
Parameters
gateway- The TokenizerGateway instancetext- The text to encode
Returns
tokens- List of token IDs
Examples
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello, world!")
iex> is_list(tokens) and length(tokens) > 0
true
Creates a new TokenizerGateway with the specified model.
Parameters
model- The model name to load. Defaults to "gpt2" which uses a BPE tokenizer similar to GPT models. Other options include model identifiers from Hugging Face.
Returns
{:ok, gateway}- Successfully created gateway{:error, reason}- Failed to load tokenizer
Examples
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
true
iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new("bert-base-uncased")
iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
true
Creates a new TokenizerGateway with the specified model, raising on error.
Parameters
model- The model name to load. Defaults to "gpt2".
Returns
gateway- Successfully created gateway
Raises
RuntimeError- If the tokenizer fails to load
Examples
iex> tokenizer = Mojentic.LLM.Gateways.TokenizerGateway.new!()
iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
true