Nous.Eval.Evaluator behaviour (nous v0.13.3)

Behaviour for evaluating agent outputs against expected results.

Evaluators determine whether an agent's output matches expectations and provide a score indicating the quality of the match.

Built-in Evaluators

Nous.Eval.Evaluators.ExactMatch - Exact string match
Nous.Eval.Evaluators.FuzzyMatch - Similarity-based match
Nous.Eval.Evaluators.Contains - Check for substrings
Nous.Eval.Evaluators.ToolUsage - Verify tool calls
Nous.Eval.Evaluators.Schema - Validate structured output
Nous.Eval.Evaluators.LLMJudge - LLM-based evaluation

Custom Evaluators

defmodule MyEvaluator do
  @behaviour Nous.Eval.Evaluator

  @impl true
  def evaluate(actual, expected, config) do
    # Your evaluation logic
    if my_check(actual, expected) do
      %{score: 1.0, passed: true, reason: nil, details: %{}}
    else
      %{score: 0.0, passed: false, reason: "Did not match", details: %{}}
    end
  end
end

Result Format

Evaluators must return a map with:

:score - Float from 0.0 to 1.0
:passed - Boolean indicating pass/fail
:reason - String explaining failure (or nil)
:details - Map with additional details

Summary

Types

result()

score()

Callbacks

evaluate(actual, expected, config)

Evaluate an actual output against expected output.

name()

Optional: Name of the evaluator for display purposes.

Functions

fail(reason, details \\ %{})

Create a failing result helper.

get_evaluator(arg1)

Get the evaluator module for an eval_type.

partial(score, reason \\ nil, details \\ %{})

Create a partial match result helper.

pass(details \\ %{})

Create a passing result helper.

run(eval_type, actual, expected, config)

Run evaluation using the appropriate evaluator.