erllama_model_backend behaviour (erllama v0.1.0)
View SourceBehaviour describing the operations the erllama_model gen_statem
needs from a backing inference engine.
Two backends ship in v0.2:
erllama_model_stub — deterministic phash2-based stubs; used
by tests that don't have a GGUF on disk. erllama_model_llama — real llama.cpp via the NIF.
Future backends (mock for fault injection, remote for distributed inference, etc.) can plug in via this same surface.
Summary
Types
-type chat_request() :: #{messages := [chat_message()], system => binary() | undefined, tools => [chat_tool()] | undefined}.
-type sampler_opts() :: #{grammar => binary(), repetition_penalty => float(), top_k => non_neg_integer(), top_p => float(), min_p => float(), temperature => float(), seed => non_neg_integer()}.
-type state() :: term().
Callbacks
-callback apply_chat_template(state(), Request :: chat_request()) -> {ok, [erllama_nif:token_id()]} | {error, term()}.
-callback configure_sampler(state(), sampler_opts()) -> {ok, state()} | {error, term()}.
-callback decode_one(state(), ContextTokens :: [erllama_nif:token_id()]) -> {ok, erllama_nif:token_id()} | {eog, erllama_nif:token_id()} | {error, term()}.
-callback detokenize(state(), [erllama_nif:token_id()]) -> binary() | {error, term()}.
-callback embed(state(), [erllama_nif:token_id()]) -> {ok, [float()]} | {error, term()}.
-callback kv_pack(state(), Tokens :: [erllama_nif:token_id()]) -> binary() | {error, term()}.
-callback prefill(state(), [erllama_nif:token_id()]) -> ok | {error, term()}.
-callback seq_rm_last(state(), NTokens :: pos_integer()) -> ok | {error, term()}.
-callback terminate(state()) -> ok.
-callback tokenize(state(), Text :: binary()) -> [erllama_nif:token_id()] | {error, term()}.