Nous.Eval.Runner (nous v0.13.3)
View SourceExecutes evaluation suites against agents.
The runner handles:
- Running individual test cases
- Parallel execution
- Metrics collection
- A/B testing
- Error handling and retries
Summary
Functions
Run an evaluation suite.
Run A/B comparison between two configurations.
Run a single test case.
Functions
@spec run( Nous.Eval.Suite.t(), keyword() ) :: {:ok, Nous.Eval.SuiteResult.t()} | {:error, term()}
Run an evaluation suite.
@spec run_ab( Nous.Eval.Suite.t(), keyword() ) :: {:ok, map()} | {:error, term()}
Run A/B comparison between two configurations.
@spec run_case( Nous.Eval.TestCase.t(), keyword() ) :: {:ok, Nous.Eval.Result.t()} | {:error, term()}
Run a single test case.