IREE.Tokenizers.Model.WordPiece (iree_tokenizers v0.7.0)

Copy Markdown View Source

WordPiece model specification compatible with IREE.Tokenizers.Tokenizer.init/1.

Summary

Types

Options for WordPiece model construction.

Functions

Returns an empty WordPiece model specification.

Builds a WordPiece model specification from a newline-delimited vocabulary file.

Builds a WordPiece model specification from an in-memory vocabulary.

Types

options()

@type options() :: [
  unk_token: String.t(),
  max_input_chars_per_word: number(),
  continuing_subword_prefix: String.t()
]

Options for WordPiece model construction.

Functions

empty()

@spec empty() :: {:ok, IREE.Tokenizers.Model.t()}

Returns an empty WordPiece model specification.

from_file(vocab_path, options \\ [])

@spec from_file(String.t(), options()) ::
  {:ok, IREE.Tokenizers.Model.t()} | {:error, term()}

Builds a WordPiece model specification from a newline-delimited vocabulary file.

init(vocab, options \\ [])

@spec init(%{required(String.t()) => integer()}, options()) ::
  {:ok, IREE.Tokenizers.Model.t()}

Builds a WordPiece model specification from an in-memory vocabulary.