Nous.Eval.Result (nous v0.13.3)

View Source

Result of a single test case evaluation.

Contains the actual output, evaluation score, metrics, and any errors.

Fields

  • :test_case_id - ID of the test case
  • :test_case_name - Display name of the test case
  • :passed - Whether the test passed
  • :score - Numeric score (0.0 to 1.0)
  • :actual_output - The output from the agent
  • :expected_output - The expected output
  • :evaluation_details - Details from the evaluator
  • :metrics - Collected metrics (tokens, latency, etc.)
  • :error - Error if the test failed to run
  • :duration_ms - Total test duration in milliseconds
  • :run_at - When the test was run

Summary

Functions

Create an error result (test failed to run).

Create a failed result.

Check if the result has an error (test didn't complete).

Create a successful result.

Types

t()

@type t() :: %Nous.Eval.Result{
  actual_output: term(),
  agent_result: map() | nil,
  duration_ms: non_neg_integer(),
  error: term() | nil,
  evaluation_details: map(),
  expected_output: term(),
  metrics: Nous.Eval.Metrics.t() | nil,
  passed: boolean(),
  run_at: DateTime.t(),
  score: float(),
  test_case_id: String.t(),
  test_case_name: String.t()
}

Functions

error(opts)

@spec error(keyword()) :: t()

Create an error result (test failed to run).

failure(opts)

@spec failure(keyword()) :: t()

Create a failed result.

has_error?(result)

@spec has_error?(t()) :: boolean()

Check if the result has an error (test didn't complete).

success(opts)

@spec success(keyword()) :: t()

Create a successful result.