Fixture Testing Guide

ReqLLM uses a comprehensive fixture-based testing system to ensure reliability across all supported models and providers. This guide explains how we validate "Supported Models" and the testing infrastructure.

Overview

The testing system validates models through the mix req_llm.model_compat task, which runs capability-focused tests against models selected from the registry.

The Model Compatibility Task

Basic Usage

# Validate all models with passing fixtures (fastest)
mix req_llm.model_compat

# Alias
mix mc

This runs tests using cached fixtures - no API calls are made. It validates models that have previously passing test results stored in priv/supported_models.json.

Validating Specific Models

# Validate all Anthropic models
mix mc anthropic

# Validate specific model
mix mc "openai:gpt-4o"

# Validate all models for a provider
mix mc "xai:*"

# List all available models from registry
mix mc --available

Recording New Fixtures

To test against live APIs and (re)generate fixtures:

# Re-record fixtures for xAI models
mix mc "xai:*" --record

# Re-record all models (not recommended, expensive)
mix mc "*:*" --record

Testing Model Subsets

# Test sample models per provider (uses config/config.exs sample list)
mix mc --sample

# Test specific provider samples
mix mc --sample anthropic

Architecture

Model Registry

Model metadata lives in priv/models_dev/*.json files, automatically synced from models.dev via mix req_llm.model_sync.

Each model entry includes:

Capabilities (tool_call, reasoning, attachment, temperature)
Modalities (input: [:text, :image], output: [:text])
Limits (context, output token limits)
Costs (input, output per 1M tokens)
API-specific metadata

Fixture State

The priv/supported_models.json file tracks which models have passing fixtures. This file is auto-generated and should not be manually edited.

Comprehensive Test Macro

Tests use the ReqLLM.ProviderTest.Comprehensive macro (in test/support/provider_test/comprehensive.ex), which generates up to 9 focused tests per model based on capabilities:

Basic generate_text (non-streaming) - All models
Streaming with system context + creative params - Models with streaming support
Token limit constraints - All models
Usage metrics and cost calculations - All models
Tool calling - multi-tool selection - Models with :tool_call capability
Tool calling - no tool when inappropriate - Models with :tool_call capability
Object generation (non-streaming) - Models with object generation support
Object generation (streaming) - Models with object generation support
Reasoning/thinking tokens - Models with :reasoning capability

Test Organization

test/coverage/
├── anthropic/
│   └── comprehensive_test.exs
├── openai/
│   └── comprehensive_test.exs
├── google/
│   └── comprehensive_test.exs
└── ...

Each provider has a single comprehensive test file:

defmodule ReqLLM.Coverage.Anthropic.ComprehensiveTest do
  use ReqLLM.ProviderTest.Comprehensive, provider: :anthropic
end

The macro automatically:

Selects models from ModelMatrix based on provider and operation type
Generates tests for each model based on capabilities
Handles fixture recording and replay
Tags tests with provider, model, and scenario

How "Supported Models" is Defined

A model is considered "supported" when it:

Has metadata in priv/models_dev/<provider>.json
Passes comprehensive tests for its advertised capabilities
Has fixture evidence stored for validation

The count you see in documentation ("135+ models currently pass our comprehensive fixture-based test suite") comes from models in priv/supported_models.json.

Semantic Tags

Tests use structured tags for precise filtering:

@moduletag :coverage                     # All coverage tests
@moduletag provider: "anthropic"         # Provider filter
@describetag model: "claude-3-5-sonnet"  # Model filter (without provider prefix)
@tag scenario: :basic                    # Scenario filter

Run specific subsets:

# All coverage tests
mix test --only coverage

# Specific provider
mix test --only "provider:anthropic"

# Specific scenario
mix test --only "scenario:basic"
mix test --only "scenario:streaming"
mix test --only "scenario:tool_multi"

# Specific model
mix test --only "model:claude-3-5-haiku-20241022"

# Combine filters
mix test --only "provider:openai" --only "scenario:basic"

Environment Variables

Fixture Mode Control

# Use cached fixtures (default, no API calls)
mix mc

# Record new fixtures (makes live API calls)
REQ_LLM_FIXTURES_MODE=record mix mc
# OR
mix mc --record

Model Selection

# Test all available models
REQ_LLM_MODELS="all" mix mc

# Test all models from a provider
REQ_LLM_MODELS="anthropic:*" mix mc

# Test specific models (comma-separated)
REQ_LLM_MODELS="openai:gpt-4o,anthropic:claude-3-5-sonnet" mix mc

# Sample N models per provider
REQ_LLM_SAMPLE=2 mix mc

# Exclude specific models
REQ_LLM_EXCLUDE="gpt-4o-mini,gpt-3.5-turbo" mix mc

Debug Output

# Verbose fixture debugging
REQ_LLM_DEBUG=1 mix mc

Fixture System Details

Fixture Storage

Fixtures are stored next to test files:

test/coverage/<provider>/fixtures/
├── basic.json
├── streaming.json
├── token_limit.json
├── usage.json
├── tool_multi.json
├── no_tool.json
├── object_basic.json
├── object_streaming.json
└── reasoning_basic.json

Fixture Format

Fixtures capture the complete API response:

{
  "captured_at": "2025-01-15T10:30:00Z",
  "model_spec": "anthropic:claude-3-5-sonnet-20241022",
  "scenario": "basic",
  "result": {
    "ok": true,
    "response": {
      "id": "msg_123",
      "model": "claude-3-5-sonnet-20241022",
      "message": {...},
      "usage": {...}
    }
  }
}

Parallel Execution

The fixture system supports parallel test execution:

Tests run concurrently for speed
State tracking skips models with passing fixtures
Use --record or --record-all to regenerate

Development Workflow

Adding a New Provider

Implement provider module and metadata
Create test file using Comprehensive macro
Record initial fixtures:
```
mix mc "<provider>:*" --record
```
Verify all tests pass:
```
mix mc "<provider>"
```

Updating Model Coverage

Sync latest model metadata:
```
mix req_llm.model_sync
```
Record fixtures for new models:
```
mix mc "<provider>:new-model" --record
```
Validate updated coverage:
```
mix mc "<provider>"
```

Refreshing Fixtures

Periodically refresh fixtures to catch API changes:

# Refresh specific provider
mix mc "anthropic:*" --record

# Refresh specific capability
REQ_LLM_FIXTURES_MODE=record mix test --only "scenario:streaming"

# Refresh all (expensive, requires all API keys)
mix mc "*:*" --record

Quality Commitments

We guarantee that all "supported models" (those counted in our documentation):

Have passing fixtures for basic functionality
Are tested against live APIs before fixture capture
Pass capability-focused tests for advertised features
Are regularly refreshed to catch provider-side changes

What's Tested

For each supported model:

✅ Text generation (streaming and non-streaming)
✅ Token limits and truncation behavior
✅ Usage metrics and cost calculation
✅ Tool calling (if advertised)
✅ Object generation (if advertised)
✅ Reasoning tokens (if advertised)

What's NOT Guaranteed

Complex edge cases beyond basic capabilities
Provider-specific features not in model metadata
Real-time behavior (fixtures may be cached)
Exact API response formats (providers may change)

Troubleshooting

Fixture Mismatch

If tests fail with fixture mismatches:

# Re-record the specific scenario
mix mc "provider:model" --record

Missing API Key

Tests skip if API key is unavailable:

# Set in .env file
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Debugging Fixture Issues

Enable verbose output:

REQ_LLM_DEBUG=1 mix test --only "provider:anthropic" --only "scenario:basic"

Best Practices

Run locally before CI: mix mc before committing
Record incrementally: Don't re-record all fixtures at once
Use samples for development: mix mc --sample for quick validation
Keep fixtures fresh: Refresh fixtures when providers update APIs
Tag tests appropriately: Use semantic tags for precise test selection

Commands Reference

# Validation (using fixtures)
mix mc                          # All models with passing fixtures
mix mc anthropic                # All Anthropic models
mix mc "openai:gpt-4o"          # Specific model
mix mc --sample                 # Sample models per provider
mix mc --available              # List all registry models

# Recording (live API calls)
mix mc --record                 # Re-record passing models
mix mc "xai:*" --record         # Re-record xAI models
mix mc "<provider>:*" --record  # Re-record specific provider

# Environment variables
REQ_LLM_FIXTURES_MODE=record    # Force recording
REQ_LLM_MODELS="pattern"        # Model selection pattern
REQ_LLM_SAMPLE=N                # Sample N per provider
REQ_LLM_EXCLUDE="model1,model2" # Exclude models
REQ_LLM_DEBUG=1                 # Verbose output

Summary

The fixture-based testing system provides:

Fast local validation with cached fixtures
Comprehensive coverage across capabilities
Parallel execution for speed
Clear model support guarantees backed by test evidence
Easy provider addition with minimal boilerplate

This system is how ReqLLM backs up the claim of "135+ supported models" - each one has fixture evidence of passing comprehensive capability tests.

← Previous Page Mix Tasks Guide

Next Page → Coverage Testing Guide