LlamaCppEx.Context (LlamaCppEx v0.7.0)

Copy Markdown View Source

Inference context with KV cache.

Summary

Functions

Clears the KV cache.

Creates a new inference context for the given model.

Decodes a list of tokens through the model.

Runs the generation loop: decodes prompt tokens and generates up to max_tokens new tokens.

Returns the context size.

Returns the max number of sequences.

Types

t()

@type t() :: %LlamaCppEx.Context{model: LlamaCppEx.Model.t(), ref: reference()}

Functions

clear(context)

@spec clear(t()) :: :ok

Clears the KV cache.

create(model, opts \\ [])

@spec create(
  LlamaCppEx.Model.t(),
  keyword()
) :: {:ok, t()} | {:error, String.t()}

Creates a new inference context for the given model.

Options

  • :n_ctx - Context size (max tokens). Defaults to 2048.
  • :n_batch - Max tokens per decode batch. Defaults to n_ctx.
  • :n_ubatch - Max tokens per micro-batch. Defaults to 512.
  • :n_threads - Number of threads for generation. Defaults to system CPU count.
  • :n_threads_batch - Number of threads for prompt processing. Defaults to :n_threads.
  • :embeddings - Enable embedding extraction. Defaults to false.
  • :pooling_type - Pooling type for embeddings. Defaults to :unspecified. Values: :unspecified, :none, :mean, :cls, :last.
  • :n_seq_max - Max number of concurrent sequences. Defaults to 1.

decode(context, tokens)

@spec decode(t(), [integer()]) :: :ok | {:error, String.t()}

Decodes a list of tokens through the model.

generate(context, sampler, tokens, opts \\ [])

@spec generate(t(), LlamaCppEx.Sampler.t(), [integer()], keyword()) ::
  {:ok, String.t()} | {:error, String.t()}

Runs the generation loop: decodes prompt tokens and generates up to max_tokens new tokens.

Returns the generated text (not including the prompt).

Options

  • :max_tokens - Maximum tokens to generate. Defaults to 256.

n_ctx(context)

@spec n_ctx(t()) :: integer()

Returns the context size.

n_seq_max(context)

@spec n_seq_max(t()) :: integer()

Returns the max number of sequences.