Model loading and introspection.
Summary
Functions
Returns the chat template string embedded in the model, or nil if none.
Returns a human-readable description of the model.
Loads a GGUF model from the given file path.
Returns the training context size of the model.
Returns the embedding dimension of the model.
Returns the number of model parameters.
Returns the model file size in bytes.
Types
@type t() :: %LlamaCppEx.Model{ref: reference()}
Functions
Returns the chat template string embedded in the model, or nil if none.
Returns a human-readable description of the model.
Loads a GGUF model from the given file path.
Options
:n_gpu_layers- Number of layers to offload to GPU. Use-1for all layers. Defaults to99(offload all layers).:use_mmap- Whether to memory-map the model file. Defaults totrue.:main_gpu- GPU device index for single-GPU mode. Defaults to0.:split_mode- How to split the model across GPUs::none,:layer, or:row. Defaults to:none.:tensor_split- List of floats specifying the proportion of work per GPU (e.g.[0.5, 0.5]for two GPUs). Defaults to[].:use_mlock- Pin model memory in RAM to prevent swapping. Defaults tofalse.:use_direct_io- Bypass page cache when loading (takes precedence over mmap). Defaults tofalse.:vocab_only- Load vocabulary and metadata only, skip weights. Defaults tofalse.
Examples
{:ok, model} = LlamaCppEx.Model.load("path/to/model.gguf", n_gpu_layers: -1)
{:ok, model} = LlamaCppEx.Model.load("path/to/model.gguf", split_mode: :layer, tensor_split: [0.5, 0.5])
{:ok, model} = LlamaCppEx.Model.load("path/to/model.gguf", vocab_only: true)
Returns the training context size of the model.
Returns the embedding dimension of the model.
Returns the number of model parameters.
Returns the model file size in bytes.