Nous.Eval.Result (nous v0.13.3)

Result of a single test case evaluation.

Contains the actual output, evaluation score, metrics, and any errors.

Fields

:test_case_id - ID of the test case
:test_case_name - Display name of the test case
:passed - Whether the test passed
:score - Numeric score (0.0 to 1.0)
:actual_output - The output from the agent
:expected_output - The expected output
:evaluation_details - Details from the evaluator
:metrics - Collected metrics (tokens, latency, etc.)
:error - Error if the test failed to run
:duration_ms - Total test duration in milliseconds
:run_at - When the test was run

Summary

Types

t()

Functions

error(opts)

Create an error result (test failed to run).

failure(opts)

Create a failed result.

has_error?(result)

Check if the result has an error (test didn't complete).

success(opts)

Create a successful result.