# `IREE.Tokenizers.Model.BPE`
[🔗](https://github.com/goodhamgupta/iree_tokenizers/blob/v0.7.0/lib/iree/tokenizers/model/bpe.ex#L1)

BPE model specification compatible with `IREE.Tokenizers.Tokenizer.init/1`.

Use this module when you already have a vocabulary and merge list in memory
or on disk and want to build an IREE-backed tokenizer from those pieces.

# `options`

```elixir
@type options() :: [
  cache_capacity: number(),
  dropout: float(),
  unk_token: String.t(),
  continuing_subword_prefix: String.t(),
  end_of_word_suffix: String.t(),
  fuse_unk: boolean(),
  byte_fallback: boolean()
]
```

Options for BPE model construction.

Supported options are intentionally close to `elixir-nx/tokenizers`, though
only the subset that can be represented through the current IREE-backed load
path is applied.

# `empty`

```elixir
@spec empty() :: {:ok, IREE.Tokenizers.Model.t()}
```

Returns an empty BPE model specification.

# `from_file`

```elixir
@spec from_file(String.t(), String.t(), options()) ::
  {:ok, IREE.Tokenizers.Model.t()} | {:error, term()}
```

Builds a BPE model specification from a vocabulary JSON file and a merges file.

The vocabulary file is expected to be a JSON object mapping token strings to
integer IDs. The merges file is expected to contain one merge pair per line.

# `init`

```elixir
@spec init(
  %{required(String.t()) =&gt; integer()},
  [{String.t(), String.t()}],
  options()
) ::
  {:ok, IREE.Tokenizers.Model.t()}
```

Builds a BPE model specification from an in-memory vocabulary and merge list.

The returned `%IREE.Tokenizers.Model{}` can be passed to
`IREE.Tokenizers.Tokenizer.init/1`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
