Tinkex.TrainingClient.Tokenizer (Tinkex v0.3.4)
View SourceTokenizer integration and operations for TrainingClient.
Provides functions to get tokenizers, encode text, and decode token IDs using the training client's model information.
Summary
Functions
Decode token IDs using the training client's tokenizer.
Encode text using the training client's tokenizer.
Get a tokenizer for the training client's model.
Functions
@spec decode(pid(), [integer()], keyword()) :: {:ok, String.t()} | {:error, Tinkex.Error.t()}
Decode token IDs using the training client's tokenizer.
Convenience wrapper around Tinkex.Tokenizer.decode/3 that automatically
resolves the tokenizer from the training client's model info.
Examples
{:ok, text} = Tokenizer.decode(client, [1, 2, 3])Options
:load_fun- Custom tokenizer loader function:info_fun- Custom info fetcher for testing
@spec encode(pid(), String.t(), keyword()) :: {:ok, [integer()]} | {:error, Tinkex.Error.t()}
Encode text using the training client's tokenizer.
Convenience wrapper around Tinkex.Tokenizer.encode/3 that automatically
resolves the tokenizer from the training client's model info.
Examples
{:ok, ids} = Tokenizer.encode(client, "Hello world")Options
:load_fun- Custom tokenizer loader function:info_fun- Custom info fetcher for testing
@spec get_tokenizer( pid(), keyword() ) :: {:ok, Tinkex.Tokenizer.handle()} | {:error, Tinkex.Error.t()}
Get a tokenizer for the training client's model.
Fetches model info to determine the tokenizer ID, applies heuristics (e.g., Llama-3 gating workaround), and loads/caches the tokenizer.
Options
:load_fun- Custom tokenizer loader function (default: HuggingFace):info_fun- Custom info fetcher for testing
Examples
{:ok, _tokenizer} = Tokenizer.get_tokenizer(client)
{:ok, ids} = Tokenizer.encode(client, "Hello world")Errors
Returns {:error, %Tinkex.Error{}} if:
- Model info cannot be fetched
- Tokenizer cannot be loaded