Nous.Eval.TestCase (nous v0.13.3)

View Source

Defines a single test case for agent evaluation.

A test case specifies an input prompt, expected output, and evaluation criteria.

Example

test_case = TestCase.new(
  id: "weather_query",
  name: "Weather Query Test",
  input: "What's the weather in Tokyo?",
  expected: %{contains: ["Tokyo", "weather"]},
  eval_type: :contains,
  tags: [:tool, :basic],
  timeout: 30_000
)

Evaluation Types

  • :exact_match - Output must exactly match expected string
  • :fuzzy_match - String similarity must exceed threshold
  • :contains - Output must contain all expected substrings
  • :tool_usage - Verify correct tools were called with correct args
  • :schema - Validate output against Ecto schema
  • :llm_judge - Use LLM to judge output quality
  • :custom - Use custom evaluator module

Expected Formats

The expected value format depends on the eval_type:

  • :exact_match - "expected string"
  • :fuzzy_match - "expected string" (with threshold in eval_config)
  • :contains - %{contains: ["word1", "word2"]} or ["word1", "word2"]
  • :tool_usage - %{tools_called: ["tool_name"], output_contains: ["..."]}
  • :schema - MyApp.Schema (module name)
  • :llm_judge - %{criteria: "...", rubric: "..."}
  • :custom - Any format understood by your evaluator

Summary

Functions

Get display name for the test case.

Create a test case from a map (used by YAML loader).

Create a new test case.

Validate a test case.

Types

eval_type()

@type eval_type() ::
  :exact_match
  | :fuzzy_match
  | :contains
  | :tool_usage
  | :schema
  | :llm_judge
  | :custom

t()

@type t() :: %Nous.Eval.TestCase{
  agent_config: keyword(),
  deps: map(),
  description: String.t() | nil,
  eval_config: map(),
  eval_type: eval_type(),
  expected: term(),
  id: String.t(),
  input: String.t() | [Nous.Message.t()],
  metadata: map(),
  name: String.t() | nil,
  tags: [atom()],
  timeout: non_neg_integer(),
  tools: [Nous.Tool.t()] | nil
}

Functions

display_name(test_case)

@spec display_name(t()) :: String.t()

Get display name for the test case.

from_map(map)

@spec from_map(map()) :: {:ok, t()} | {:error, term()}

Create a test case from a map (used by YAML loader).

new(opts)

@spec new(keyword()) :: t()

Create a new test case.

Options

  • :id - Unique identifier (required)
  • :input - Input prompt or messages (required)
  • :name - Human-readable name
  • :description - Longer description
  • :expected - Expected output (format depends on eval_type)
  • :eval_type - Evaluation type (default: :contains)
  • :eval_config - Configuration for the evaluator
  • :tags - List of tags for filtering
  • :deps - Dependencies to pass to agent
  • :tools - Tools to provide to the agent
  • :agent_config - Additional agent configuration
  • :timeout - Timeout in milliseconds (default: 30_000)
  • :metadata - Additional metadata

Examples

# Simple contains check
TestCase.new(
  id: "greeting",
  input: "Say hello",
  expected: %{contains: ["hello"]}
)

# Exact match
TestCase.new(
  id: "math",
  input: "What is 2+2?",
  expected: "4",
  eval_type: :exact_match
)

# Fuzzy match with threshold
TestCase.new(
  id: "fuzzy",
  input: "What is the capital of France?",
  expected: "Paris is the capital of France",
  eval_type: :fuzzy_match,
  eval_config: %{threshold: 0.7}
)

# Tool usage verification
TestCase.new(
  id: "tool_test",
  input: "What's the weather?",
  expected: %{tools_called: ["get_weather"]},
  eval_type: :tool_usage,
  tools: [weather_tool]
)

validate(tc)

@spec validate(t()) :: :ok | {:error, term()}

Validate a test case.