Nous.Eval.Runner (nous v0.13.3)

View Source

Executes evaluation suites against agents.

The runner handles:

  • Running individual test cases
  • Parallel execution
  • Metrics collection
  • A/B testing
  • Error handling and retries

Summary

Functions

Run an evaluation suite.

Run A/B comparison between two configurations.

Run a single test case.

Functions

run(suite, opts \\ [])

@spec run(
  Nous.Eval.Suite.t(),
  keyword()
) :: {:ok, Nous.Eval.SuiteResult.t()} | {:error, term()}

Run an evaluation suite.

run_ab(suite, opts \\ [])

@spec run_ab(
  Nous.Eval.Suite.t(),
  keyword()
) :: {:ok, map()} | {:error, term()}

Run A/B comparison between two configurations.

run_case(test_case, opts \\ [])

@spec run_case(
  Nous.Eval.TestCase.t(),
  keyword()
) :: {:ok, Nous.Eval.Result.t()} | {:error, term()}

Run a single test case.