Dsxir. Optimizer. LabeledFewShot
(dsxir v0.1.0)
Copy Markdown
Optimizer that slots up to :max_labeled_demos examples from trainset into
each predictor's Dsxir.Program.State.demos. No LM call. The metric argument
is accepted but unused — interface uniformity for richer optimizers.
Multi-predictor pipelines
A labeled demo is slotted into a predictor only when the demo's data keys cover the predictor's declared input + output fields. Trainset examples that fit one predictor's signature but not another's are routed only to predictors they fit; non-matching predictors get an empty labeled-demos list. This keeps saved artifacts well-formed and mirrors DSPy's per-predictor demo handling.
Options
:max_labeled_demos(default16) — upper bound on demos per predictor. Clamped tolength(trainset); no padding.:deterministic(defaultfalse) — whentrue, demo selection is reproducible across runs (Enum.sort_by/2by:erlang.phash2/1, then take). Whenfalse,Enum.take_random/2is used.
Returned stats
%{labeled_demos: non_neg_integer(),
predictor_count: non_neg_integer(),
deterministic: boolean()}Trainset hash is stored in program.metadata.trainset_hash so subsequent
compiles can detect a trainset change without re-encoding the trainset.
Trainset hash
metadata.trainset_hash is :crypto.hash(:sha256, :erlang.term_to_binary(trainset)) |> Base.encode16(case: :lower).
The hash is intentionally permutation-sensitive and representation-sensitive:
reordering the trainset, or using string keys instead of atom keys in
%Dsxir.Example{data: ...}, will produce a different hash even when the
labeled content is semantically equal. Use it as a fast change-detector,
not as a semantic equality oracle. For stable hashing, always use atom keys
in Example.data.