# `LlamaCppEx.Context`
[🔗](https://github.com/nyo16/llama_cpp_ex/blob/main/lib/llama_cpp_ex/context.ex#L1)

Inference context with KV cache.

# `t`

```elixir
@type t() :: %LlamaCppEx.Context{model: LlamaCppEx.Model.t(), ref: reference()}
```

# `clear`

```elixir
@spec clear(t()) :: :ok
```

Clears the KV cache.

# `create`

```elixir
@spec create(
  LlamaCppEx.Model.t(),
  keyword()
) :: {:ok, t()} | {:error, String.t()}
```

Creates a new inference context for the given model.

## Options

  * `:n_ctx` - Context size (max tokens). Defaults to `2048`.
  * `:n_batch` - Max tokens per decode batch. Defaults to `n_ctx`.
  * `:n_ubatch` - Max tokens per micro-batch. Defaults to `512`.
  * `:n_threads` - Number of threads for generation. Defaults to system CPU count.
  * `:n_threads_batch` - Number of threads for prompt processing. Defaults to `:n_threads`.
  * `:embeddings` - Enable embedding extraction. Defaults to `false`.
  * `:pooling_type` - Pooling type for embeddings. Defaults to `:unspecified`.
    Values: `:unspecified`, `:none`, `:mean`, `:cls`, `:last`.
  * `:n_seq_max` - Max number of concurrent sequences. Defaults to `1`.

# `decode`

```elixir
@spec decode(t(), [integer()]) :: :ok | {:error, String.t()}
```

Decodes a list of tokens through the model.

# `generate`

```elixir
@spec generate(t(), LlamaCppEx.Sampler.t(), [integer()], keyword()) ::
  {:ok, String.t()} | {:error, String.t()}
```

Runs the generation loop: decodes prompt tokens and generates up to `max_tokens` new tokens.

Returns the generated text (not including the prompt).

## Options

  * `:max_tokens` - Maximum tokens to generate. Defaults to `256`.

# `n_ctx`

```elixir
@spec n_ctx(t()) :: integer()
```

Returns the context size.

# `n_seq_max`

```elixir
@spec n_seq_max(t()) :: integer()
```

Returns the max number of sequences.

---

*Consult [api-reference.md](api-reference.md) for complete listing*