View Source Tokenizers.Decoder (Tokenizers v0.4.0)
Decoders and decoding functions.
Decoder transforms a sequence of token ids back to a readable piece of text.
Some normalizers and pre-tokenizers use special characters or identifiers that need special logic to be reverted.
Summary
Functions
Creates a BPE decoder.
Creates a ByteFallback decoder.
Creates a ByteLevel decoder.
Creates a CTC decoder.
Decodes tokens into string with provided decoder.
Creates a Fuse decoder.
Creates a Metaspace decoder.
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
Creates a Strip decoder.
Creates a WordPiece decoder.
Types
@type t() :: %Tokenizers.Decoder{resource: reference()}
Functions
Creates a BPE decoder.
Options
suffix
- the suffix to add to the end of each word. Defaults to</w>
@spec byte_fallback() :: t()
Creates a ByteFallback decoder.
@spec byte_level() :: t()
Creates a ByteLevel decoder.
Creates a CTC decoder.
Options
pad_token
- the token used for padding. Defaults to<pad>
word_delimiter_token
- the token used for word delimiter. Defaults to|
cleanup
- whether to cleanup tokenization artifacts, defaults totrue
Decodes tokens into string with provided decoder.
@spec fuse() :: t()
Creates a Fuse decoder.
Creates a Metaspace decoder.
Options
replacement
- the replacement character. Defaults to▁
(as char)add_prefix_space
- whether to add a space to the first word. Defaults totrue
Creates a Replace decoder.
Combines a list of decoders into a single sequential decoder.
@spec strip(char(), non_neg_integer(), non_neg_integer()) :: t()
Creates a Strip decoder.
It expects a character and the number of times to strip the
character on left
and right
sides.
Creates a WordPiece decoder.
Options
prefix
- The prefix to use for subwords. Defaults to##
cleanup
- Whether to cleanup tokenization artifacts. Defaults totrue