Nous.Eval.Runner (nous v0.13.3)

Executes evaluation suites against agents.

The runner handles:

Summary

Run an evaluation suite.

Run A/B comparison between two configurations.

Run a single test case.

@spec run(
  Nous.Eval.Suite.t(),
  keyword()
) :: {:ok, Nous.Eval.SuiteResult.t()} | {:error, term()}

Run an evaluation suite.

@spec run_ab(
  Nous.Eval.Suite.t(),
  keyword()
) :: {:ok, map()} | {:error, term()}

Run A/B comparison between two configurations.

@spec run_case(
  Nous.Eval.TestCase.t(),
  keyword()
) :: {:ok, Nous.Eval.Result.t()} | {:error, term()}

Run a single test case.