Result returned by encoding operations.
This module intentionally mirrors the most useful Tokenizers.Encoding
helpers so callers can inspect token IDs, offsets, masks, and derived
metadata without dealing with the NIF directly.
Summary
Functions
Returns the attention mask.
Returns the token IDs.
Returns the number of tokens in the encoding.
Returns the number of sequences represented by the encoding.
Returns byte offsets for each token.
Returns overflowing encodings, if any.
Returns sequence IDs for each token, with special tokens represented as nil.
Returns the special-tokens mask.
Returns the token strings corresponding to the encoding.
Returns the type IDs.
Returns the attention mask packed into a little-endian u32 binary.
Returns the token IDs packed into a little-endian u32 binary.
Returns the special-tokens mask packed into a little-endian u32 binary.
Returns the type IDs packed into a little-endian u32 binary.
Returns word IDs for each token.
Alias for get_length/1.
Pads the encoding to target_length.
Replaces all sequence IDs in the encoding with the given value.
Applies a list of transformations in order.
Truncates the encoding to max_length.
Types
@type t() :: %IREE.Tokenizers.Encoding{ attention_mask: [non_neg_integer()], ids: [integer()], offsets: nil | [{non_neg_integer(), non_neg_integer()}], special_tokens_mask: [non_neg_integer()], tokens: [binary()], type_ids: [non_neg_integer()] }
An encoded token sequence with optional offsets and derived masks.
Functions
Returns the attention mask.
Returns the token IDs.
@spec get_length(t()) :: non_neg_integer()
Returns the number of tokens in the encoding.
@spec get_n_sequences(t()) :: non_neg_integer()
Returns the number of sequences represented by the encoding.
The current IREE-backed implementation only emits single-sequence encodings.
Returns byte offsets for each token.
Returns overflowing encodings, if any.
The current implementation does not emit overflowing pieces and always returns an empty list.
@spec get_sequence_ids(t()) :: [non_neg_integer() | nil]
Returns sequence IDs for each token, with special tokens represented as nil.
Returns the special-tokens mask.
Returns the token strings corresponding to the encoding.
Returns the type IDs.
Returns the attention mask packed into a little-endian u32 binary.
Returns the token IDs packed into a little-endian u32 binary.
Returns the special-tokens mask packed into a little-endian u32 binary.
Returns the type IDs packed into a little-endian u32 binary.
@spec get_word_ids(t()) :: [nil]
Returns word IDs for each token.
The current implementation does not track word IDs and returns nil entries.
@spec n_tokens(t()) :: non_neg_integer()
Alias for get_length/1.
@spec pad(t(), non_neg_integer(), keyword()) :: t()
Pads the encoding to target_length.
Supported options:
:direction-:leftor:right, defaults to:right:pad_id- token ID used for padding, defaults to0:pad_type_id- type ID used for padding, defaults to0:pad_token- token string used for padding, defaults to"[PAD]"
@spec set_sequence_id(t(), non_neg_integer()) :: t()
Replaces all sequence IDs in the encoding with the given value.
@spec transform(t(), [IREE.Tokenizers.Encoding.Transformation.t()]) :: t()
Applies a list of transformations in order.
@spec truncate(t(), non_neg_integer(), keyword()) :: t()
Truncates the encoding to max_length.
Supported options:
:direction-:leftor:right, defaults to:right:stride- accepted for compatibility, currently not applied