View Source Tokenizers.Decoder (Tokenizers v0.5.1)
Decoders and decoding functions.
Decoder transforms a sequence of token ids back to a readable piece of text.
Some normalizers and pre-tokenizers use special characters or identifiers that need special logic to be reverted.
Summary
Functions
Creates a BPE decoder.
Creates a ByteFallback decoder.
Creates a ByteLevel decoder.
Creates a CTC decoder.
Decodes tokens into string with provided decoder.
Creates a Fuse decoder.
Creates a Metaspace decoder.
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
Creates a Strip decoder.
Creates a WordPiece decoder.
Types
@type t() :: %Tokenizers.Decoder{resource: reference()}
Functions
Creates a BPE decoder.
Options
:suffix
- the suffix to add to the end of each word. Defaults to</w>
@spec byte_fallback() :: t()
Creates a ByteFallback decoder.
@spec byte_level() :: t()
Creates a ByteLevel decoder.
Creates a CTC decoder.
Options
:pad_token
- the token used for padding. Defaults to<pad>
:word_delimiter_token
- the token used for word delimiter. Defaults to|
:cleanup
- whether to cleanup tokenization artifacts, defaults totrue
Decodes tokens into string with provided decoder.
@spec fuse() :: t()
Creates a Fuse decoder.
Creates a Metaspace decoder.
Options
:replacement
- the replacement character. Defaults to▁
(as char):prepend_scheme
- whether to add a space to the first word if there isn't already one. This lets us treat "hello" exactly like "say hello". Either of:always
,:never
,:first
.:first
means the space is only added on the first token (relevant when special tokens are used or other pre_tokenizer are used). Defaults to:always
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
@spec strip(char(), non_neg_integer(), non_neg_integer()) :: t()
Creates a Strip decoder.
It expects a character and the number of times to strip the
character on left
and right
sides.
Creates a WordPiece decoder.
Options
:prefix
- The prefix to use for subwords. Defaults to##
:cleanup
- Whether to cleanup tokenization artifacts. Defaults totrue