# `Mojentic.LLM.Gateways.TokenizerGateway`
[🔗](https://github.com/svetzal/mojentic-ex/blob/v1.2.0/lib/mojentic/llm/gateways/tokenizer_gateway.ex#L1)

Gateway for tokenizing and detokenizing text using Hugging Face tokenizers.

This gateway provides encoding and decoding functionality for text,
which is useful for:
- Counting tokens to manage context windows
- Understanding token usage for cost estimation
- Debugging token-related issues

The gateway uses the `tokenizers` library, which provides Rust-based
tokenizers via Rustler NIF bindings for high performance.

## Examples

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
    iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello, world!")
    iex> text = Mojentic.LLM.Gateways.TokenizerGateway.decode(tokenizer, tokens)
    iex> text
    "Hello, world!"

# `t`

```elixir
@type t() :: %Mojentic.LLM.Gateways.TokenizerGateway{
  tokenizer: Tokenizers.Tokenizer.t()
}
```

# `count_tokens`

```elixir
@spec count_tokens(t(), String.t()) :: non_neg_integer()
```

Counts the number of tokens in a text string.

This is a convenience function that encodes the text and returns
the token count.

## Parameters

  * `gateway` - The TokenizerGateway instance
  * `text` - The text to count tokens for

## Returns

  * `count` - The number of tokens

## Examples

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
    iex> count = Mojentic.LLM.Gateways.TokenizerGateway.count_tokens(tokenizer, "Hello, world!")
    iex> count > 0
    true

# `decode`

```elixir
@spec decode(t(), [integer()]) :: String.t()
```

Decodes tokens back into text.

## Parameters

  * `gateway` - The TokenizerGateway instance
  * `tokens` - List of token IDs to decode

## Returns

  * `text` - The decoded text

## Examples

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
    iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello!")
    iex> text = Mojentic.LLM.Gateways.TokenizerGateway.decode(tokenizer, tokens)
    iex> text
    "Hello!"

# `encode`

```elixir
@spec encode(t(), String.t()) :: [integer()]
```

Encodes text into tokens.

## Parameters

  * `gateway` - The TokenizerGateway instance
  * `text` - The text to encode

## Returns

  * `tokens` - List of token IDs

## Examples

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
    iex> tokens = Mojentic.LLM.Gateways.TokenizerGateway.encode(tokenizer, "Hello, world!")
    iex> is_list(tokens) and length(tokens) > 0
    true

# `new`

```elixir
@spec new(String.t()) :: {:ok, t()} | {:error, term()}
```

Creates a new TokenizerGateway with the specified model.

## Parameters

  * `model` - The model name to load. Defaults to "gpt2" which uses a BPE tokenizer
    similar to GPT models. Other options include model identifiers from Hugging Face.

## Returns

  * `{:ok, gateway}` - Successfully created gateway
  * `{:error, reason}` - Failed to load tokenizer

## Examples

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new()
    iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
    true

    iex> {:ok, tokenizer} = Mojentic.LLM.Gateways.TokenizerGateway.new("bert-base-uncased")
    iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
    true

# `new!`

```elixir
@spec new!(String.t()) :: t()
```

Creates a new TokenizerGateway with the specified model, raising on error.

## Parameters

  * `model` - The model name to load. Defaults to "gpt2".

## Returns

  * `gateway` - Successfully created gateway

## Raises

  * `RuntimeError` - If the tokenizer fails to load

## Examples

    iex> tokenizer = Mojentic.LLM.Gateways.TokenizerGateway.new!()
    iex> is_struct(tokenizer, Mojentic.LLM.Gateways.TokenizerGateway)
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
