mix nous.eval (nous v0.13.3)
View SourceRun evaluation suites for Nous agents.
Usage
# Run all suites from default directory (test/eval/suites)
mix nous.eval
# Run a specific suite file
mix nous.eval --suite test/eval/suites/basic.yaml
# Run from a different directory
mix nous.eval --dir priv/eval
# Filter by tags
mix nous.eval --tags basic,tool
# Exclude tags
mix nous.eval --exclude slow,stress
# Override model
mix nous.eval --model lmstudio:ministral-3-14b-reasoning
# Set parallelism
mix nous.eval --parallel 4
# Output format
mix nous.eval --format json
mix nous.eval --format json --output results.json
# Verbose mode
mix nous.eval --verboseOptions
--suite- Path to a specific suite file (YAML)--dir- Directory containing suite files (default: test/eval/suites)--tags- Only run test cases with these tags (comma-separated)--exclude- Exclude test cases with these tags (comma-separated)--model- Override default model for all tests--parallel- Number of concurrent tests (default: 1)--timeout- Default timeout in ms (default: 30000)--format- Output format: console, json, markdown (default: console)--output- Output file path (for json/markdown formats)--verbose- Show detailed output including passed tests--retry- Number of retries for failed tests (default: 0)
Configuration
You can also configure defaults in your config:
config :nous, Nous.Eval,
default_model: "lmstudio:ministral-3-14b-reasoning",
default_timeout: 30_000,
parallelism: 4