Dsxir. Optimizer. BootstrapFewShot
(dsxir v0.1.0)
Copy Markdown
Two-phase optimizer: slot labeled demos from the trainset (phase 1), then augment with bootstrapped demos captured from successful traces (phase 2).
Phases
Labeled. Up to
:max_labeled_demosexamples are picked from the trainset (uniform random; deterministic-by-hash when:deterministicis set). Each chosen example is slotted as%Dsxir.Demo{kind: :labeled}only into predictors whose declared input + output fields the demo's data keys cover; non-matching predictors get no labeled demo from that example. No LM call.Bootstrap. For each round in
1..max_rounds, the trainset is walked example-by-example. For each example, the program is run inside aDsxir.with_trace/1frame with per-call opts seeded for diversity (temperature: cfg.diversity_temperature,cache: false, plus a per-round per-example nonce). When the metric coerces to>= :threshold, each trace entry is pushed into the matching predictor'sdemos_poolas%Dsxir.Demo{kind: :bootstrapped, source: %{round: R, example_index: I}}until:max_bootstrapped_demosis reached.
Diversity is delivered by pushing a Dsxir.Settings.context/2 frame that
swaps the resolved :lm config tuple with one carrying the diversity
keywords. The LM dispatcher reads :lm from settings and merges per-call
opts on top, so the temperature lever reaches the wire protocol.
Options
:max_labeled_demos(default4) — cap on phase 1 demos per predictor.:max_bootstrapped_demos(default4) — cap on phase 2 demos per predictor.:max_rounds(default1) — number of bootstrap passes over the trainset.:threshold(default1.0) — coerced metric must meet or exceed this to keep the trace. Accepted threshold types:true | false | integer() | float(). Booleans coerce to1.0/0.0. Other values raiseFunctionClauseErrorduring option parsing — bootstrap is a fail-fast operation on bad configuration.:max_errors(default10) — aggregate cap on per-example errors. Exceeding returns a framework-classed error.:deterministic(defaultfalse) — whentrue, phase 1 selection is hash-stable and phase-2 trainset order is hash-stable. Phase-2 LM outputs are still nondeterministic via temperature.:diversity_temperature(default1.0) — temperature forwarded as per-call opt during phase 2.
Returned stats
%{
labeled_demos: non_neg_integer(),
bootstrapped_demos: non_neg_integer(),
predictor_count: non_neg_integer(),
rounds: non_neg_integer(),
error_count: non_neg_integer(),
max_errors: non_neg_integer(),
threshold: float()
}Errors
Per-example raises are caught and stamped with
path: [:bootstrap_few_shot, :"round_R", :"example_I"]. When
error_count > max_errors, compile/4 returns
{:error, %Dsxir.Errors.Framework.OptimizerError{optimizer: __MODULE__, inner: aggregate}} where inner is an aggregate produced via Splode's
to_class helper on Dsxir.Errors. Callers can traverse per-predictor
sub-errors via Splode's traverse_errors helper.
Trainset hash
metadata.trainset_hash is
:crypto.hash(:sha256, :erlang.term_to_binary(trainset)) |> Base.encode16(case: :lower).