AI.PretendTokenizer (fnord v0.9.29)
View SourceOpenAI's tokenizer uses regexes that are not compatible with Erlang's regex engine. There are a couple of modules available on hex, but all of them require a working python installation, access to rustc, a number of external dependencies, and some env flags set to allow it to compile.
Rather than impose that on end users, this module uses a deliberately conservative token estimator. It guesstimates token counts with extra room for token-dense inputs so callers can choose chunk sizes with a buffer for inaccuracy.
Summary
Types
@type chunk_size() :: non_neg_integer() | AI.Model.t()
@type chunked_input() :: [String.t()]
@type input() :: String.t()
@type reduction_factor() :: float()