mix nous.eval (nous v0.13.3)

View Source

Run evaluation suites for Nous agents.

Usage

# Run all suites from default directory (test/eval/suites)
mix nous.eval

# Run a specific suite file
mix nous.eval --suite test/eval/suites/basic.yaml

# Run from a different directory
mix nous.eval --dir priv/eval

# Filter by tags
mix nous.eval --tags basic,tool

# Exclude tags
mix nous.eval --exclude slow,stress

# Override model
mix nous.eval --model lmstudio:ministral-3-14b-reasoning

# Set parallelism
mix nous.eval --parallel 4

# Output format
mix nous.eval --format json
mix nous.eval --format json --output results.json

# Verbose mode
mix nous.eval --verbose

Options

  • --suite - Path to a specific suite file (YAML)
  • --dir - Directory containing suite files (default: test/eval/suites)
  • --tags - Only run test cases with these tags (comma-separated)
  • --exclude - Exclude test cases with these tags (comma-separated)
  • --model - Override default model for all tests
  • --parallel - Number of concurrent tests (default: 1)
  • --timeout - Default timeout in ms (default: 30000)
  • --format - Output format: console, json, markdown (default: console)
  • --output - Output file path (for json/markdown formats)
  • --verbose - Show detailed output including passed tests
  • --retry - Number of retries for failed tests (default: 0)

Configuration

You can also configure defaults in your config:

config :nous, Nous.Eval,
  default_model: "lmstudio:ministral-3-14b-reasoning",
  default_timeout: 30_000,
  parallelism: 4