IREE.Tokenizers.EncodeStream (iree_tokenizers v0.6.0)

Copy Markdown View Source

Streaming encoder state.

Use this when you want to feed a tokenizer incrementally from multiple binary chunks while preserving the same output you would get from one-shot encoding of the full input.

Summary

Types

t()

Mutable streaming encode state owned by the NIF.

Functions

Feeds a binary chunk into the stream and returns any newly produced token IDs.

Flushes any remaining state and returns the final token IDs.

Creates a new encode stream for the given tokenizer.

Types

t()

@type t() :: %IREE.Tokenizers.EncodeStream{resource: reference()}

Mutable streaming encode state owned by the NIF.

Functions

feed(stream, chunk)

@spec feed(t(), binary()) :: {:ok, [integer()]} | {:error, {atom(), binary()}}

Feeds a binary chunk into the stream and returns any newly produced token IDs.

finalize(stream)

@spec finalize(t()) :: {:ok, [integer()]} | {:error, {atom(), binary()}}

Flushes any remaining state and returns the final token IDs.

new(tokenizer, opts \\ [])

@spec new(
  IREE.Tokenizers.Tokenizer.t(),
  keyword()
) :: {:ok, t()} | {:error, {atom(), binary()}}

Creates a new encode stream for the given tokenizer.

Options:

  • :add_special_tokens - whether post-processing special tokens should be emitted during finalization, defaults to true
  • :max_chunk_bytes - maximum chunk size expected by feed/2, defaults to 65536