Fallback tokenizer using a heuristic character/word based approximation.
Roughly mirrors GPT tokenization by counting word boundaries and punctuation.
Summary
Functions
Counts tokens for a conversation message by summing the content token count and a fixed role-based overhead (2 tokens for most roles, 4 for tool messages).
Estimates the token count of a text string using a heuristic word-and-punctuation scan. Splits on word boundaries and counts each alphanumeric run and punctuation character as one token, roughly approximating GPT tokenization.
Functions
Counts tokens for a conversation message by summing the content token count and a fixed role-based overhead (2 tokens for most roles, 4 for tool messages).
Parameters
message— A%CommBus.Message{}struct.opts— Forwarded tocount_tokens/2.
Returns
A non-negative integer representing the estimated token count.
Estimates the token count of a text string using a heuristic word-and-punctuation scan. Splits on word boundaries and counts each alphanumeric run and punctuation character as one token, roughly approximating GPT tokenization.
Parameters
text— The text string to count tokens for._opts— Ignored; present for callback conformance.
Returns
A non-negative integer token count estimate.