View Source TFLiteElixir.Tokenizer.WordpieceTokenizer (tflite_elixir v0.3.7)

Runs WordPiece tokenziation.

Summary

Functions

Tokenizes a piece of text into its word pieces.

Functions

@spec tokenize(String.t(), map()) :: [String.t()]

Tokenizes a piece of text into its word pieces.

This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary.

For example:

input = "unaffable".
output = ["una", "##ffa", "##ble"].
input = "unaffableX".
output = ["[UNK]"].

Related link: https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/ios/BertQACore/Models/Tokenizers/WordpieceTokenizer.swift