Nous.Eval.TestCase (nous v0.13.3)

Defines a single test case for agent evaluation.

A test case specifies an input prompt, expected output, and evaluation criteria.

Example

test_case = TestCase.new(
  id: "weather_query",
  name: "Weather Query Test",
  input: "What's the weather in Tokyo?",
  expected: %{contains: ["Tokyo", "weather"]},
  eval_type: :contains,
  tags: [:tool, :basic],
  timeout: 30_000
)

Evaluation Types

:exact_match - Output must exactly match expected string
:fuzzy_match - String similarity must exceed threshold
:contains - Output must contain all expected substrings
:tool_usage - Verify correct tools were called with correct args
:schema - Validate output against Ecto schema
:llm_judge - Use LLM to judge output quality
:custom - Use custom evaluator module

Expected Formats

The expected value format depends on the eval_type:

:exact_match - "expected string"
:fuzzy_match - "expected string" (with threshold in eval_config)
:contains - %{contains: ["word1", "word2"]} or ["word1", "word2"]
:tool_usage - %{tools_called: ["tool_name"], output_contains: ["..."]}
:schema - MyApp.Schema (module name)
:llm_judge - %{criteria: "...", rubric: "..."}
:custom - Any format understood by your evaluator

Summary

Types

eval_type()

t()

Functions

display_name(test_case)

Get display name for the test case.

from_map(map)

Create a test case from a map (used by YAML loader).

new(opts)

Create a new test case.

validate(tc)

Validate a test case.

Types

eval_type()

@type eval_type() ::
  :exact_match
  | :fuzzy_match
  | :contains
  | :tool_usage
  | :schema
  | :llm_judge
  | :custom

t()

@type t() :: %Nous.Eval.TestCase{
  agent_config: keyword(),
  deps: map(),
  description: String.t() | nil,
  eval_config: map(),
  eval_type: eval_type(),
  expected: term(),
  id: String.t(),
  input: String.t() | [Nous.Message.t()],
  metadata: map(),
  name: String.t() | nil,
  tags: [atom()],
  timeout: non_neg_integer(),
  tools: [Nous.Tool.t()] | nil
}

Functions

display_name(test_case)

@spec display_name(t()) :: String.t()

Get display name for the test case.

from_map(map)

@spec from_map(map()) :: {:ok, t()} | {:error, term()}

Create a test case from a map (used by YAML loader).

new(opts)

@spec new(keyword()) :: t()

Create a new test case.

Options

:id - Unique identifier (required)
:input - Input prompt or messages (required)
:name - Human-readable name
:description - Longer description
:expected - Expected output (format depends on eval_type)
:eval_type - Evaluation type (default: :contains)
:eval_config - Configuration for the evaluator
:tags - List of tags for filtering
:deps - Dependencies to pass to agent
:tools - Tools to provide to the agent
:agent_config - Additional agent configuration
:timeout - Timeout in milliseconds (default: 30_000)
:metadata - Additional metadata

Examples

# Simple contains check
TestCase.new(
  id: "greeting",
  input: "Say hello",
  expected: %{contains: ["hello"]}
)

# Exact match
TestCase.new(
  id: "math",
  input: "What is 2+2?",
  expected: "4",
  eval_type: :exact_match
)

# Fuzzy match with threshold
TestCase.new(
  id: "fuzzy",
  input: "What is the capital of France?",
  expected: "Paris is the capital of France",
  eval_type: :fuzzy_match,
  eval_config: %{threshold: 0.7}
)

# Tool usage verification
TestCase.new(
  id: "tool_test",
  input: "What's the weather?",
  expected: %{tools_called: ["get_weather"]},
  eval_type: :tool_usage,
  tools: [weather_tool]
)

validate(tc)

@spec validate(t()) :: :ok | {:error, term()}

Validate a test case.