View Source Tokenizers.Decoder (Tokenizers v0.5.1)
Decoders and decoding functions.
Decoder transforms a sequence of token ids back to a readable piece of text.
Some normalizers and pre-tokenizers use special characters or identifiers that need special logic to be reverted.
Summary
Functions
Creates a BPE decoder.
Creates a ByteFallback decoder.
Creates a ByteLevel decoder.
Creates a CTC decoder.
Decodes tokens into string with provided decoder.
Creates a Fuse decoder.
Creates a Metaspace decoder.
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
Creates a Strip decoder.
Creates a WordPiece decoder.
Types
@type t() :: %Tokenizers.Decoder{resource: reference()}
Functions
Creates a BPE decoder.
Options
:suffix- the suffix to add to the end of each word. Defaults to</w>
@spec byte_fallback() :: t()
Creates a ByteFallback decoder.
@spec byte_level() :: t()
Creates a ByteLevel decoder.
Creates a CTC decoder.
Options
:pad_token- the token used for padding. Defaults to<pad>:word_delimiter_token- the token used for word delimiter. Defaults to|:cleanup- whether to cleanup tokenization artifacts, defaults totrue
Decodes tokens into string with provided decoder.
@spec fuse() :: t()
Creates a Fuse decoder.
Creates a Metaspace decoder.
Options
:replacement- the replacement character. Defaults to▁(as char):prepend_scheme- whether to add a space to the first word if there isn't already one. This lets us treat "hello" exactly like "say hello". Either of:always,:never,:first.:firstmeans the space is only added on the first token (relevant when special tokens are used or other pre_tokenizer are used). Defaults to:always
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
@spec strip(char(), non_neg_integer(), non_neg_integer()) :: t()
Creates a Strip decoder.
It expects a character and the number of times to strip the
character on left and right sides.
Creates a WordPiece decoder.
Options
:prefix- The prefix to use for subwords. Defaults to##:cleanup- Whether to cleanup tokenization artifacts. Defaults totrue