Nous.Eval.Suite (nous v0.13.3)
View SourceA collection of test cases with shared configuration.
Suites group related test cases and provide shared defaults for model, timeout, and other settings.
Example
suite = Suite.new(
name: "weather_agent_tests",
description: "Tests for the weather agent",
default_model: "lmstudio:ministral-3-14b-reasoning",
test_cases: [
TestCase.new(id: "basic", input: "What's the weather?", ...),
TestCase.new(id: "city", input: "Weather in Tokyo?", ...)
]
)Loading from YAML
{:ok, suite} = Suite.from_yaml("test/eval/suites/weather.yaml")Filtering
# Filter by tags
filtered = Suite.filter_by_tags(suite, [:basic, :tool])
# Exclude tags
filtered = Suite.exclude_tags(suite, [:slow])
Summary
Functions
Add a test case to the suite.
Add multiple test cases to the suite.
Get all unique tags used in the suite.
Get number of test cases.
Exclude test cases with any of the specified tags.
Filter test cases by tags (include only cases with ANY of the specified tags).
Load all suites from a directory.
Load a suite from a YAML file.
Load a suite from a YAML file, raising on error.
Get test case by ID.
Create a new test suite.
Validate a suite and all its test cases.
Types
@type t() :: %Nous.Eval.Suite{ default_instructions: String.t() | nil, default_model: String.t() | nil, default_timeout: non_neg_integer(), description: String.t() | nil, metadata: map(), name: String.t(), parallelism: non_neg_integer(), retry_failed: non_neg_integer(), setup: (-> map()) | nil, teardown: (map() -> :ok) | nil, test_cases: [Nous.Eval.TestCase.t()] }
Functions
@spec add_case(t(), Nous.Eval.TestCase.t()) :: t()
Add a test case to the suite.
@spec add_cases(t(), [Nous.Eval.TestCase.t()]) :: t()
Add multiple test cases to the suite.
Get all unique tags used in the suite.
@spec count(t()) :: non_neg_integer()
Get number of test cases.
Exclude test cases with any of the specified tags.
Filter test cases by tags (include only cases with ANY of the specified tags).
Load all suites from a directory.
Loads all .yaml and .yml files from the directory.
Load a suite from a YAML file.
Example YAML
name: my_suite
default_model: lmstudio:ministral-3-14b-reasoning
default_timeout: 30000
test_cases:
- id: greeting
input: "Say hello"
expected:
contains: ["hello"]
eval_type: contains
tags: [basic]
- id: math
input: "What is 2+2?"
expected: "4"
eval_type: exact_match
Load a suite from a YAML file, raising on error.
@spec get_case(t(), String.t()) :: Nous.Eval.TestCase.t() | nil
Get test case by ID.
Create a new test suite.
Options
:name- Suite name (required):description- Human-readable description:test_cases- List of TestCase structs:default_model- Default model for all test cases:default_instructions- Default system instructions:default_timeout- Default timeout in ms (default: 30_000):parallelism- Number of concurrent tests (default: 1):retry_failed- Retry count for failed tests (default: 0):setup- Function called before suite runs:teardown- Function called after suite runs:metadata- Additional metadata
Example
suite = Suite.new(
name: "my_tests",
default_model: "lmstudio:ministral-3-14b-reasoning",
test_cases: [
TestCase.new(id: "test1", input: "Hello"),
TestCase.new(id: "test2", input: "World")
]
)
Validate a suite and all its test cases.