Text tokenization and detokenization.
Summary
Functions
Returns the BOS (beginning of sentence) token ID.
Decodes a list of token IDs back into text.
Encodes text into a list of token IDs.
Returns whether a token is an end-of-generation token.
Returns the EOS (end of sentence) token ID.
Converts a single token ID to its text representation.
Returns the vocabulary size.
Functions
@spec bos_token(LlamaCppEx.Model.t()) :: integer()
Returns the BOS (beginning of sentence) token ID.
@spec decode(LlamaCppEx.Model.t(), [integer()]) :: {:ok, String.t()} | {:error, String.t()}
Decodes a list of token IDs back into text.
@spec encode(LlamaCppEx.Model.t(), String.t(), keyword()) :: {:ok, [integer()]} | {:error, String.t()}
Encodes text into a list of token IDs.
Options
:add_special- Add special tokens (BOS/EOS). Defaults totrue.:parse_special- Parse special token text (e.g.,<|im_start|>). Defaults totrue.
@spec eog?(LlamaCppEx.Model.t(), integer()) :: boolean()
Returns whether a token is an end-of-generation token.
@spec eos_token(LlamaCppEx.Model.t()) :: integer()
Returns the EOS (end of sentence) token ID.
@spec token_to_piece(LlamaCppEx.Model.t(), integer()) :: String.t()
Converts a single token ID to its text representation.
@spec vocab_size(LlamaCppEx.Model.t()) :: integer()
Returns the vocabulary size.