erllama_cache_meta_srv (erllama v0.1.0)
View SourceSole writer for the cache meta and LRU ETS tables; arbitrates claim/release and the reservation state machine for save publication.
Two read-mostly ETS tables, owned by this process and protected
so any caller can read them without a server hop:
erllama_cache_meta : set, key = cache_key, row layout per
include/erllama_cache.hrl ?POS_* constantserllama_cache_lru : ordered_set, key = {LastUsedNs, cache_key},
value = []Two server-internal maps in process state:
holders : MonRef -> {Pid, Key}; one entry per active claim reservations : Key -> #reservation{}; one entry per in-flight save
Plus a waiters map for lookup_exact_or_wait/2 which defers replies
until the in-flight save publishes (or the per-call deadline fires).
The reservation state machine has two stages, pre_link and
post_link, to make crash cleanup correct: a writer that died
before link/2 leaves no file; a writer that died after link/2
may have left a valid .kvc we can validate-and-adopt.
Summary
Functions
Evict oldest available rows until at least TargetBytes have been freed (or no more candidates remain). Returns the number of rows evicted and the bytes actually freed.
Evict oldest available rows whose tier is in Tiers until at least
TargetBytes have been freed. Tiers = all matches every tier;
otherwise it must be a list drawn from [ram, ram_file, disk].
Walk Tokens backward in Stride steps and return the row for the longest cached prefix. Pure ETS reads, no server hop. Used by stateless callers (HTTP front-end, agent loops) that resend a full conversation each turn and don't have a parent_key to thread.
Types
-type state() :: #state{holders :: #{reference() => {pid(), erllama_cache:cache_key()}}, reservations :: #{erllama_cache:cache_key() => #reservation{writer :: pid(), token :: reference(), monref :: reference(), expires_ns :: integer(), stage :: pre_link | post_link, tier :: erllama_cache:tier(), path :: file:name() | undefined}}, waiters :: #{erllama_cache:cache_key() => [{gen_server:from(), integer(), reference()}]}, sweep_timer :: reference() | undefined}.
Functions
-spec announce_saved(erllama_cache:cache_key(), reference(), non_neg_integer(), binary()) -> ok | {error, expired}.
-spec announce_saved(erllama_cache:cache_key(), reference(), non_neg_integer(), binary(), binary() | undefined) -> ok | {error, expired}.
-spec cancel_reservation(erllama_cache:cache_key(), reference()) -> ok.
-spec check_reservation(erllama_cache:cache_key(), reference()) -> ok | {error, expired}.
-spec checkin(reference()) -> ok.
-spec checkout(erllama_cache:cache_key(), pid()) -> {ok, reference(), erllama_cache:tier(), term(), binary(), term()} | {error, busy} | miss.
-spec dump() -> [tuple()].
-spec dump(erllama_cache:cache_key()) -> {ok, tuple()} | miss.
-spec evict_bytes(non_neg_integer()) -> {evicted, non_neg_integer(), non_neg_integer()}.
Evict oldest available rows until at least TargetBytes have been freed (or no more candidates remain). Returns the number of rows evicted and the bytes actually freed.
-spec evict_bytes(non_neg_integer(), all | [erllama_cache:tier()]) -> {evicted, non_neg_integer(), non_neg_integer()}.
Evict oldest available rows whose tier is in Tiers until at least
TargetBytes have been freed. Tiers = all matches every tier;
otherwise it must be a list drawn from [ram, ram_file, disk].
-spec gc() -> {evicted, non_neg_integer()}.
-spec init([]) -> {ok, state()}.
-spec insert_available(erllama_cache:cache_key(), erllama_cache:tier(), non_neg_integer(), binary(), term()) -> ok.
-spec insert_available(erllama_cache:cache_key(), erllama_cache:tier(), non_neg_integer(), binary(), term(), binary() | undefined) -> ok.
-spec lookup_exact(erllama_cache:cache_key()) -> {ok, tuple()} | miss.
-spec lookup_exact_or_wait(erllama_cache:cache_key(), non_neg_integer()) -> {ok, tuple()} | miss.
-spec lookup_longest_prefix(map(), [erllama_nif:token_id()], pos_integer(), pos_integer()) -> {ok, pos_integer(), tuple()} | miss.
Walk Tokens backward in Stride steps and return the row for the longest cached prefix. Pure ETS reads, no server hop. Used by stateless callers (HTTP front-end, agent loops) that resend a full conversation each turn and don't have a parent_key to thread.
Stops at MinTokens floor and returns miss if nothing matches.
Walks at most length(Tokens) / Stride rows; with the default
2048-token stride that's ~15 lookups for a 30k-token prompt.
Encodes the full token list to a binary once at entry, then passes
binary:part(TokensBin, 0, N*4) sub-binaries (O(1) views) to
erllama_cache_key:make/4 per probe. Avoids re-traversing the list
and re-allocating the binary on every step; the only per-probe cost
is the SHA-256 over N*4 bytes plus the ETS lookup.
-spec mark_published(erllama_cache:cache_key(), reference(), file:name()) -> ok | {error, expired}.
-spec reserve_save(erllama_cache:cache_key(), erllama_cache:tier(), pid()) -> {ok, reference()} | {error, already_present | conflict}.