# `Dsxir.Evaluate`

Devset evaluation runner.

Fan-out via `Task.Supervisor.async_stream_nolink/4` under
`Dsxir.TaskSupervisor`. Settings are snapshot once on the caller and replayed
per worker via `Dsxir.Settings.run/2` so settings-scoped state (lm, adapter,
metadata, cache) is preserved across workers.

Per-example errors are caught at the worker boundary, classified via
`Dsxir.Errors.class_of/1`, and counted in
`EvaluationResult.errors.by_class`. The runner does not abort on individual
row failures; `run!/2` raises after the run completes when any row errored.

Telemetry:

  * `[:dsxir, :evaluate, :item]` — per row.
    Measurements: `%{duration, metric_value}` (`metric_value: nil` on error).
    Metadata: `%{example, prediction, error_class}` (`prediction: nil` on
    error, `error_class: nil` on success).
  * `[:dsxir, :evaluate, :stop]` — once.
    Measurements: `%{duration, score, total, error_count, save_as}`
    (`save_as: nil` when not set).
    Metadata: `%{evaluator, devset_size, max_errors}`.

When `:save_as` is set, the result rows are written to disk as JSON-Lines
(one row per line) before `run/2` returns.

# `t`

```elixir
@type t() :: %Dsxir.Evaluate{
  devset: [Dsxir.Example.t()],
  failure_score: float(),
  max_errors: non_neg_integer(),
  metric: Dsxir.Metric.t(),
  num_threads: pos_integer(),
  save_as: nil | Path.t(),
  timeout: pos_integer()
}
```

# `run`

```elixir
@spec run(t(), Dsxir.Program.t()) :: Dsxir.EvaluationResult.t()
```

Evaluate `program` over the configured devset. Per-row failures are caught
and reported in the returned `EvaluationResult`; the run never aborts on a
single error. When `:save_as` is set, the rows are persisted as JSON Lines
before returning.

# `run!`

```elixir
@spec run!(t(), Dsxir.Program.t()) :: Dsxir.EvaluationResult.t()
```

Bang variant of `run/2`. Returns the result when zero rows errored and
otherwise raises `Dsxir.Errors.Framework.PredictorError` with the per-class
error counts.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
