View Source AI.Tokenizer behaviour (fnord v0.5.8)
Oh yeah? I'm gonna make my own tokenizer, with blackjack and hookers!
-- ~Bender~ ChatGPT
The only tokenizer modules available for elixir when this was written are either older and don't correctly count for OpenAI's newer models (Gpt3Tokenizer) or can't be used in an escript because they require priv access or OTP support beyond escript's abilities (Tokenizers).
Summary
Functions
Splits a string into chunks of max_tokens
tokens using the algorithm
defined for the specified model.
Decodes a list of token IDs into a text string using the algorithm defined for the specified model.
Encodes a text string into a list of token IDs using the algorithm defined for the specified model.
Returns the tokenizer implementation module currently in use. This is defined
in the application config, under fnord/tokenizer
, allowing it to be
overridden for testing.
Callbacks
Functions
Splits a string into chunks of max_tokens
tokens using the algorithm
defined for the specified model.
Decodes a list of token IDs into a text string using the algorithm defined for the specified model.
Encodes a text string into a list of token IDs using the algorithm defined for the specified model.
Returns the tokenizer implementation module currently in use. This is defined
in the application config, under fnord/tokenizer
, allowing it to be
overridden for testing.