erllama_cache_key (erllama v0.1.0)

View Source

Summary

Functions

Compute an effective fingerprint from a base model fingerprint and a list of attached LoRA adapters.

Variant taking a pre-encoded TokensBin (u32-LE per token, matching encode_tokens/1). Used by the longest-prefix walk so a caller can encode once and pass binary:part(AllTokensBin, 0, N*4) sub-binaries per probe, avoiding the per-attempt list traversal + list comprehension allocation. Sub-binaries are O(1) views, so this turns the per-probe cost into just the SHA-256 work.

Types

components()

-type components() ::
          #{fingerprint := <<_:256>>,
            quant_type := quant_type(),
            ctx_params_hash := <<_:256>>,
            tokens := [non_neg_integer()]}.

key()

-type key() :: <<_:256>>.

quant_type()

-type quant_type() ::
          f32 | f16 | bf16 | q4_0 | q4_1 | q5_0 | q5_1 | q8_0 | q2_k | q3_k_s | q3_k_m | q3_k_l |
          q4_k_m | q4_k_s | q5_k_m | q5_k_s | q6_k | q8_k | iq1_s | iq1_m | iq2_xxs | iq2_xs | iq2_s |
          iq2_m | iq3_xxs | iq3_xs | iq3_s | iq3_m | iq4_nl | iq4_xs |
          atom().

Functions

decode_tokens(Bin)

-spec decode_tokens(binary()) -> [non_neg_integer()].

effective_fingerprint/2

-spec effective_fingerprint(<<_:256>>, [{<<_:256>>, float()}]) -> <<_:256>>.

Compute an effective fingerprint from a base model fingerprint and a list of attached LoRA adapters.

LoRA changes the model's logits, not its inputs, so attached adapters must enter the cache key. Two requests on the same model with different adapter sets / scales must never collide or false-hit each other.

effective_fp = sha256(model_fp || sorted_pairs) where sorted_pairs is the byte concatenation of (adapter_sha256 || u64_le(scale_q32)) for every attached adapter, sorted by adapter_sha256 for determinism. scale_q32 is the scale multiplied by 2^32 and rounded to int64, so floating-point representation isn't part of the key.

An empty adapter list returns the base fingerprint unchanged.

encode_tokens(Tokens)

-spec encode_tokens([non_neg_integer()]) -> binary().

make/1

-spec make(components()) -> key().

make(Fp, QT, CtxHash, TokensBin)

-spec make(<<_:256>>, quant_type(), <<_:256>>, binary()) -> key().

Variant taking a pre-encoded TokensBin (u32-LE per token, matching encode_tokens/1). Used by the longest-prefix walk so a caller can encode once and pass binary:part(AllTokensBin, 0, N*4) sub-binaries per probe, avoiding the per-attempt list traversal + list comprehension allocation. Sub-binaries are O(1) views, so this turns the per-probe cost into just the SHA-256 work.

quant_atom/1

-spec quant_atom(0..255) -> {ok, quant_type()} | {error, unknown_quant}.

quant_byte/1

-spec quant_byte(quant_type()) -> 0..255.