View Source AI.Tokenizer behaviour (fnord v0.5.8)

Oh yeah? I'm gonna make my own tokenizer, with blackjack and hookers!

                                                -- ~Bender~ ChatGPT

The only tokenizer modules available for elixir when this was written are either older and don't correctly count for OpenAI's newer models (Gpt3Tokenizer) or can't be used in an escript because they require priv access or OTP support beyond escript's abilities (Tokenizers).

Summary

Functions

Splits a string into chunks of max_tokens tokens using the algorithm defined for the specified model.

Decodes a list of token IDs into a text string using the algorithm defined for the specified model.

Encodes a text string into a list of token IDs using the algorithm defined for the specified model.

Returns the tokenizer implementation module currently in use. This is defined in the application config, under fnord/tokenizer, allowing it to be overridden for testing.

Callbacks

decode(token_ids, model)

@callback decode(
  token_ids :: list(),
  model :: String.t()
) :: String.t()

encode(text, model)

@callback encode(
  text :: String.t(),
  model :: String.t()
) :: list()

Functions

chunk(input, max_tokens, model)

Splits a string into chunks of max_tokens tokens using the algorithm defined for the specified model.

decode(token_ids, model)

Decodes a list of token IDs into a text string using the algorithm defined for the specified model.

encode(text, model)

Encodes a text string into a list of token IDs using the algorithm defined for the specified model.

impl()

Returns the tokenizer implementation module currently in use. This is defined in the application config, under fnord/tokenizer, allowing it to be overridden for testing.