Nous.Eval.TestCase (nous v0.13.3)
View SourceDefines a single test case for agent evaluation.
A test case specifies an input prompt, expected output, and evaluation criteria.
Example
test_case = TestCase.new(
id: "weather_query",
name: "Weather Query Test",
input: "What's the weather in Tokyo?",
expected: %{contains: ["Tokyo", "weather"]},
eval_type: :contains,
tags: [:tool, :basic],
timeout: 30_000
)Evaluation Types
:exact_match- Output must exactly match expected string:fuzzy_match- String similarity must exceed threshold:contains- Output must contain all expected substrings:tool_usage- Verify correct tools were called with correct args:schema- Validate output against Ecto schema:llm_judge- Use LLM to judge output quality:custom- Use custom evaluator module
Expected Formats
The expected value format depends on the eval_type:
:exact_match-"expected string":fuzzy_match-"expected string"(with threshold in eval_config):contains-%{contains: ["word1", "word2"]}or["word1", "word2"]:tool_usage-%{tools_called: ["tool_name"], output_contains: ["..."]}:schema-MyApp.Schema(module name):llm_judge-%{criteria: "...", rubric: "..."}:custom- Any format understood by your evaluator
Summary
Functions
Get display name for the test case.
Create a test case from a map (used by YAML loader).
Create a new test case.
Validate a test case.
Types
@type eval_type() ::
:exact_match
| :fuzzy_match
| :contains
| :tool_usage
| :schema
| :llm_judge
| :custom
@type t() :: %Nous.Eval.TestCase{ agent_config: keyword(), deps: map(), description: String.t() | nil, eval_config: map(), eval_type: eval_type(), expected: term(), id: String.t(), input: String.t() | [Nous.Message.t()], metadata: map(), name: String.t() | nil, tags: [atom()], timeout: non_neg_integer(), tools: [Nous.Tool.t()] | nil }
Functions
Get display name for the test case.
Create a test case from a map (used by YAML loader).
Create a new test case.
Options
:id- Unique identifier (required):input- Input prompt or messages (required):name- Human-readable name:description- Longer description:expected- Expected output (format depends on eval_type):eval_type- Evaluation type (default: :contains):eval_config- Configuration for the evaluator:tags- List of tags for filtering:deps- Dependencies to pass to agent:tools- Tools to provide to the agent:agent_config- Additional agent configuration:timeout- Timeout in milliseconds (default: 30_000):metadata- Additional metadata
Examples
# Simple contains check
TestCase.new(
id: "greeting",
input: "Say hello",
expected: %{contains: ["hello"]}
)
# Exact match
TestCase.new(
id: "math",
input: "What is 2+2?",
expected: "4",
eval_type: :exact_match
)
# Fuzzy match with threshold
TestCase.new(
id: "fuzzy",
input: "What is the capital of France?",
expected: "Paris is the capital of France",
eval_type: :fuzzy_match,
eval_config: %{threshold: 0.7}
)
# Tool usage verification
TestCase.new(
id: "tool_test",
input: "What's the weather?",
expected: %{tools_called: ["get_weather"]},
eval_type: :tool_usage,
tools: [weather_tool]
)
Validate a test case.