CommBus.Tokenizer behaviour (CommBus v0.1.0)

Copy Markdown View Source

Token counting façade with pluggable backends.

Configure with:

config :comm_bus, :tokenizer, CommBus.Tokenizer.Simple

or supply tokenizer: MyTokenizer in function opts.

Summary

Functions

Annotates a list of entries with token counts by calling annotate_entry/2 on each entry.

Fills in the token_count field of an entry by counting tokens in its content.

Counts the number of tokens in a message, including role overhead.

Counts the number of tokens in the given text string using the configured tokenizer backend.

Returns the tokenizer module resolved from options or application config.

Callbacks

count_message(t, keyword)

@callback count_message(
  CommBus.Message.t(),
  keyword()
) :: non_neg_integer()

count_tokens(t, keyword)

@callback count_tokens(
  String.t(),
  keyword()
) :: non_neg_integer()

Functions

annotate_entries(entries, opts \\ [])

@spec annotate_entries(
  [CommBus.Entry.t()],
  keyword()
) :: [CommBus.Entry.t()]

Annotates a list of entries with token counts by calling annotate_entry/2 on each entry.

Parameters

  • entries — List of %CommBus.Entry{} structs.
  • opts — Keyword options forwarded to the tokenizer backend.

Returns

A list of entries with token_count fields populated.

annotate_entry(entry, opts \\ [])

@spec annotate_entry(
  CommBus.Entry.t(),
  keyword()
) :: CommBus.Entry.t()

Fills in the token_count field of an entry by counting tokens in its content.

If the entry already has a non-nil token_count, it is returned unchanged.

Parameters

  • entry — A %CommBus.Entry{} struct.
  • opts — Keyword options forwarded to the tokenizer backend.

Returns

The entry with token_count populated.

message_count(message, opts \\ [])

@spec message_count(
  CommBus.Message.t(),
  keyword()
) :: non_neg_integer()

Counts the number of tokens in a message, including role overhead.

Parameters

  • message — A %CommBus.Message{} struct.
  • opts — Keyword options; :tokenizer overrides the configured backend.

Returns

A non-negative integer representing the total token count for the message.

token_count(text, opts \\ [])

@spec token_count(
  String.t(),
  keyword()
) :: non_neg_integer()

Counts the number of tokens in the given text string using the configured tokenizer backend.

Parameters

  • text — The text string to tokenize.
  • opts — Keyword options; :tokenizer overrides the configured backend.

Returns

A non-negative integer representing the token count.

tokenizer(opts \\ [])

@spec tokenizer(keyword()) :: module()

Returns the tokenizer module resolved from options or application config.

Checks, in order: the :tokenizer key in opts, the :tokenizer application env for :comm_bus, and falls back to CommBus.Tokenizer.Simple.

Parameters

  • opts — Keyword options; :tokenizer overrides the configured backend.

Returns

The tokenizer module (an atom implementing the CommBus.Tokenizer behaviour).