Fixture Testing Guide
View SourceReqLLM uses a comprehensive fixture-based testing system to ensure reliability across all supported models and providers. This guide explains how we validate "Supported Models" and the testing infrastructure.
Overview
The testing system validates models through the mix req_llm.model_compat task, which runs capability-focused tests against models selected from the registry.
The Model Compatibility Task
Basic Usage
# Validate all models with passing fixtures (fastest)
mix req_llm.model_compat
# Alias
mix mc
This runs tests using cached fixtures - no API calls are made. It validates models that have previously passing test results stored in priv/supported_models.json.
Validating Specific Models
# Validate all Anthropic models
mix mc anthropic
# Validate specific model
mix mc "openai:gpt-4o"
# Validate all models for a provider
mix mc "xai:*"
# List all available models from registry
mix mc --available
Recording New Fixtures
To test against live APIs and (re)generate fixtures:
# Re-record fixtures for xAI models
mix mc "xai:*" --record
# Re-record all models (not recommended, expensive)
mix mc "*:*" --record
Testing Model Subsets
# Test sample models per provider (uses config/config.exs sample list)
mix mc --sample
# Test specific provider samples
mix mc --sample anthropic
Architecture
Model Registry
Model metadata lives in priv/models_dev/*.json files, automatically synced from models.dev via mix req_llm.model_sync.
Each model entry includes:
- Capabilities (
tool_call,reasoning,attachment,temperature) - Modalities (
input: [:text, :image],output: [:text]) - Limits (
context,outputtoken limits) - Costs (
input,outputper 1M tokens) - API-specific metadata
Fixture State
The priv/supported_models.json file tracks which models have passing fixtures. This file is auto-generated and should not be manually edited.
Comprehensive Test Macro
Tests use the ReqLLM.ProviderTest.Comprehensive macro (in test/support/provider_test/comprehensive.ex), which generates up to 9 focused tests per model based on capabilities:
- Basic generate_text (non-streaming) - All models
- Streaming with system context + creative params - Models with streaming support
- Token limit constraints - All models
- Usage metrics and cost calculations - All models
- Tool calling - multi-tool selection - Models with
:tool_callcapability - Tool calling - no tool when inappropriate - Models with
:tool_callcapability - Object generation (non-streaming) - Models with object generation support
- Object generation (streaming) - Models with object generation support
- Reasoning/thinking tokens - Models with
:reasoningcapability
Test Organization
test/coverage/
├── anthropic/
│ └── comprehensive_test.exs
├── openai/
│ └── comprehensive_test.exs
├── google/
│ └── comprehensive_test.exs
└── ...Each provider has a single comprehensive test file:
defmodule ReqLLM.Coverage.Anthropic.ComprehensiveTest do
use ReqLLM.ProviderTest.Comprehensive, provider: :anthropic
endThe macro automatically:
- Selects models from
ModelMatrixbased on provider and operation type - Generates tests for each model based on capabilities
- Handles fixture recording and replay
- Tags tests with provider, model, and scenario
How "Supported Models" is Defined
A model is considered "supported" when it:
- Has metadata in
priv/models_dev/<provider>.json - Passes comprehensive tests for its advertised capabilities
- Has fixture evidence stored for validation
The count you see in documentation ("135+ models currently pass our comprehensive fixture-based test suite") comes from models in priv/supported_models.json.
Semantic Tags
Tests use structured tags for precise filtering:
@moduletag :coverage # All coverage tests
@moduletag provider: "anthropic" # Provider filter
@describetag model: "claude-3-5-sonnet" # Model filter (without provider prefix)
@tag scenario: :basic # Scenario filterRun specific subsets:
# All coverage tests
mix test --only coverage
# Specific provider
mix test --only "provider:anthropic"
# Specific scenario
mix test --only "scenario:basic"
mix test --only "scenario:streaming"
mix test --only "scenario:tool_multi"
# Specific model
mix test --only "model:claude-3-5-haiku-20241022"
# Combine filters
mix test --only "provider:openai" --only "scenario:basic"
Environment Variables
Fixture Mode Control
# Use cached fixtures (default, no API calls)
mix mc
# Record new fixtures (makes live API calls)
REQ_LLM_FIXTURES_MODE=record mix mc
# OR
mix mc --record
Model Selection
# Test all available models
REQ_LLM_MODELS="all" mix mc
# Test all models from a provider
REQ_LLM_MODELS="anthropic:*" mix mc
# Test specific models (comma-separated)
REQ_LLM_MODELS="openai:gpt-4o,anthropic:claude-3-5-sonnet" mix mc
# Sample N models per provider
REQ_LLM_SAMPLE=2 mix mc
# Exclude specific models
REQ_LLM_EXCLUDE="gpt-4o-mini,gpt-3.5-turbo" mix mc
Debug Output
# Verbose fixture debugging
REQ_LLM_DEBUG=1 mix mc
Fixture System Details
Fixture Storage
Fixtures are stored next to test files:
test/coverage/<provider>/fixtures/
├── basic.json
├── streaming.json
├── token_limit.json
├── usage.json
├── tool_multi.json
├── no_tool.json
├── object_basic.json
├── object_streaming.json
└── reasoning_basic.jsonFixture Format
Fixtures capture the complete API response:
{
"captured_at": "2025-01-15T10:30:00Z",
"model_spec": "anthropic:claude-3-5-sonnet-20241022",
"scenario": "basic",
"result": {
"ok": true,
"response": {
"id": "msg_123",
"model": "claude-3-5-sonnet-20241022",
"message": {...},
"usage": {...}
}
}
}Parallel Execution
The fixture system supports parallel test execution:
- Tests run concurrently for speed
- State tracking skips models with passing fixtures
- Use
--recordor--record-allto regenerate
Development Workflow
Adding a New Provider
- Implement provider module and metadata
- Create test file using
Comprehensivemacro - Record initial fixtures:
mix mc "<provider>:*" --record - Verify all tests pass:
mix mc "<provider>"
Updating Model Coverage
- Sync latest model metadata:
mix req_llm.model_sync - Record fixtures for new models:
mix mc "<provider>:new-model" --record - Validate updated coverage:
mix mc "<provider>"
Refreshing Fixtures
Periodically refresh fixtures to catch API changes:
# Refresh specific provider
mix mc "anthropic:*" --record
# Refresh specific capability
REQ_LLM_FIXTURES_MODE=record mix test --only "scenario:streaming"
# Refresh all (expensive, requires all API keys)
mix mc "*:*" --record
Quality Commitments
We guarantee that all "supported models" (those counted in our documentation):
- Have passing fixtures for basic functionality
- Are tested against live APIs before fixture capture
- Pass capability-focused tests for advertised features
- Are regularly refreshed to catch provider-side changes
What's Tested
For each supported model:
- ✅ Text generation (streaming and non-streaming)
- ✅ Token limits and truncation behavior
- ✅ Usage metrics and cost calculation
- ✅ Tool calling (if advertised)
- ✅ Object generation (if advertised)
- ✅ Reasoning tokens (if advertised)
What's NOT Guaranteed
- Complex edge cases beyond basic capabilities
- Provider-specific features not in model metadata
- Real-time behavior (fixtures may be cached)
- Exact API response formats (providers may change)
Troubleshooting
Fixture Mismatch
If tests fail with fixture mismatches:
# Re-record the specific scenario
mix mc "provider:model" --record
Missing API Key
Tests skip if API key is unavailable:
# Set in .env file
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
Debugging Fixture Issues
Enable verbose output:
REQ_LLM_DEBUG=1 mix test --only "provider:anthropic" --only "scenario:basic"
Best Practices
- Run locally before CI:
mix mcbefore committing - Record incrementally: Don't re-record all fixtures at once
- Use samples for development:
mix mc --samplefor quick validation - Keep fixtures fresh: Refresh fixtures when providers update APIs
- Tag tests appropriately: Use semantic tags for precise test selection
Commands Reference
# Validation (using fixtures)
mix mc # All models with passing fixtures
mix mc anthropic # All Anthropic models
mix mc "openai:gpt-4o" # Specific model
mix mc --sample # Sample models per provider
mix mc --available # List all registry models
# Recording (live API calls)
mix mc --record # Re-record passing models
mix mc "xai:*" --record # Re-record xAI models
mix mc "<provider>:*" --record # Re-record specific provider
# Environment variables
REQ_LLM_FIXTURES_MODE=record # Force recording
REQ_LLM_MODELS="pattern" # Model selection pattern
REQ_LLM_SAMPLE=N # Sample N per provider
REQ_LLM_EXCLUDE="model1,model2" # Exclude models
REQ_LLM_DEBUG=1 # Verbose output
Summary
The fixture-based testing system provides:
- Fast local validation with cached fixtures
- Comprehensive coverage across capabilities
- Parallel execution for speed
- Clear model support guarantees backed by test evidence
- Easy provider addition with minimal boilerplate
This system is how ReqLLM backs up the claim of "135+ supported models" - each one has fixture evidence of passing comprehensive capability tests.