Nous.Eval.Evaluator behaviour (nous v0.13.3)

View Source

Behaviour for evaluating agent outputs against expected results.

Evaluators determine whether an agent's output matches expectations and provide a score indicating the quality of the match.

Built-in Evaluators

Custom Evaluators

defmodule MyEvaluator do
  @behaviour Nous.Eval.Evaluator

  @impl true
  def evaluate(actual, expected, config) do
    # Your evaluation logic
    if my_check(actual, expected) do
      %{score: 1.0, passed: true, reason: nil, details: %{}}
    else
      %{score: 0.0, passed: false, reason: "Did not match", details: %{}}
    end
  end
end

Result Format

Evaluators must return a map with:

  • :score - Float from 0.0 to 1.0
  • :passed - Boolean indicating pass/fail
  • :reason - String explaining failure (or nil)
  • :details - Map with additional details

Summary

Callbacks

Evaluate an actual output against expected output.

Optional: Name of the evaluator for display purposes.

Functions

Create a failing result helper.

Get the evaluator module for an eval_type.

Create a partial match result helper.

Create a passing result helper.

Run evaluation using the appropriate evaluator.

Types

result()

@type result() :: %{
  score: score(),
  passed: boolean(),
  reason: String.t() | nil,
  details: map()
}

score()

@type score() :: float()

Callbacks

evaluate(actual, expected, config)

@callback evaluate(actual :: term(), expected :: term(), config :: map()) :: result()

Evaluate an actual output against expected output.

Parameters

  • actual - The actual output from the agent
  • expected - The expected output
  • config - Configuration map for the evaluator

Returns

A result map with score, passed status, and details.

name()

(optional)
@callback name() :: String.t()

Optional: Name of the evaluator for display purposes.

Functions

fail(reason, details \\ %{})

@spec fail(String.t(), map()) :: result()

Create a failing result helper.

get_evaluator(arg1)

@spec get_evaluator(atom()) :: module() | nil

Get the evaluator module for an eval_type.

partial(score, reason \\ nil, details \\ %{})

@spec partial(float(), String.t() | nil, map()) :: result()

Create a partial match result helper.

pass(details \\ %{})

@spec pass(map()) :: result()

Create a passing result helper.

run(eval_type, actual, expected, config)

@spec run(atom(), term(), term(), map()) :: result()

Run evaluation using the appropriate evaluator.