Dsxir.Optimizer.LabeledFewShot (dsxir v0.1.0)

Copy Markdown

Optimizer that slots up to :max_labeled_demos examples from trainset into each predictor's Dsxir.Program.State.demos. No LM call. The metric argument is accepted but unused — interface uniformity for richer optimizers.

Multi-predictor pipelines

A labeled demo is slotted into a predictor only when the demo's data keys cover the predictor's declared input + output fields. Trainset examples that fit one predictor's signature but not another's are routed only to predictors they fit; non-matching predictors get an empty labeled-demos list. This keeps saved artifacts well-formed and mirrors DSPy's per-predictor demo handling.

Options

  • :max_labeled_demos (default 16) — upper bound on demos per predictor. Clamped to length(trainset); no padding.
  • :deterministic (default false) — when true, demo selection is reproducible across runs (Enum.sort_by/2 by :erlang.phash2/1, then take). When false, Enum.take_random/2 is used.

Returned stats

%{labeled_demos: non_neg_integer(),
  predictor_count: non_neg_integer(),
  deterministic: boolean()}

Trainset hash is stored in program.metadata.trainset_hash so subsequent compiles can detect a trainset change without re-encoding the trainset.

Trainset hash

metadata.trainset_hash is :crypto.hash(:sha256, :erlang.term_to_binary(trainset)) |> Base.encode16(case: :lower). The hash is intentionally permutation-sensitive and representation-sensitive: reordering the trainset, or using string keys instead of atom keys in %Dsxir.Example{data: ...}, will produce a different hash even when the labeled content is semantically equal. Use it as a fast change-detector, not as a semantic equality oracle. For stable hashing, always use atom keys in Example.data.