Fixture Testing Guide

View Source

ReqLLM uses a comprehensive fixture-based testing system to ensure reliability across all supported models and providers. This guide explains how we validate "Supported Models" and the testing infrastructure.

Overview

The testing system validates models through the mix req_llm.model_compat task, which runs capability-focused tests against models selected from the registry.

The Model Compatibility Task

Basic Usage

# Validate all models with passing fixtures (fastest)
mix req_llm.model_compat

# Alias
mix mc

This runs tests using cached fixtures - no API calls are made. It validates models that have previously passing test results stored in priv/supported_models.json.

Validating Specific Models

# Validate all Anthropic models
mix mc anthropic

# Validate specific model
mix mc "openai:gpt-4o"

# Validate all models for a provider
mix mc "xai:*"

# List all available models from registry
mix mc --available

Recording New Fixtures

To test against live APIs and (re)generate fixtures:

# Re-record fixtures for xAI models
mix mc "xai:*" --record

# Re-record all models (not recommended, expensive)
mix mc "*:*" --record

Testing Model Subsets

# Test sample models per provider (uses config/config.exs sample list)
mix mc --sample

# Test specific provider samples
mix mc --sample anthropic

Architecture

Model Registry

Model metadata lives in priv/models_dev/*.json files, automatically synced from models.dev via mix req_llm.model_sync.

Each model entry includes:

  • Capabilities (tool_call, reasoning, attachment, temperature)
  • Modalities (input: [:text, :image], output: [:text])
  • Limits (context, output token limits)
  • Costs (input, output per 1M tokens)
  • API-specific metadata

Fixture State

The priv/supported_models.json file tracks which models have passing fixtures. This file is auto-generated and should not be manually edited.

Comprehensive Test Macro

Tests use the ReqLLM.ProviderTest.Comprehensive macro (in test/support/provider_test/comprehensive.ex), which generates up to 9 focused tests per model based on capabilities:

  1. Basic generate_text (non-streaming) - All models
  2. Streaming with system context + creative params - Models with streaming support
  3. Token limit constraints - All models
  4. Usage metrics and cost calculations - All models
  5. Tool calling - multi-tool selection - Models with :tool_call capability
  6. Tool calling - no tool when inappropriate - Models with :tool_call capability
  7. Object generation (non-streaming) - Models with object generation support
  8. Object generation (streaming) - Models with object generation support
  9. Reasoning/thinking tokens - Models with :reasoning capability

Test Organization

test/coverage/
 anthropic/
    comprehensive_test.exs
 openai/
    comprehensive_test.exs
 google/
    comprehensive_test.exs
 ...

Each provider has a single comprehensive test file:

defmodule ReqLLM.Coverage.Anthropic.ComprehensiveTest do
  use ReqLLM.ProviderTest.Comprehensive, provider: :anthropic
end

The macro automatically:

  • Selects models from ModelMatrix based on provider and operation type
  • Generates tests for each model based on capabilities
  • Handles fixture recording and replay
  • Tags tests with provider, model, and scenario

How "Supported Models" is Defined

A model is considered "supported" when it:

  1. Has metadata in priv/models_dev/<provider>.json
  2. Passes comprehensive tests for its advertised capabilities
  3. Has fixture evidence stored for validation

The count you see in documentation ("135+ models currently pass our comprehensive fixture-based test suite") comes from models in priv/supported_models.json.

Semantic Tags

Tests use structured tags for precise filtering:

@moduletag :coverage                     # All coverage tests
@moduletag provider: "anthropic"         # Provider filter
@describetag model: "claude-3-5-sonnet"  # Model filter (without provider prefix)
@tag scenario: :basic                    # Scenario filter

Run specific subsets:

# All coverage tests
mix test --only coverage

# Specific provider
mix test --only "provider:anthropic"

# Specific scenario
mix test --only "scenario:basic"
mix test --only "scenario:streaming"
mix test --only "scenario:tool_multi"

# Specific model
mix test --only "model:claude-3-5-haiku-20241022"

# Combine filters
mix test --only "provider:openai" --only "scenario:basic"

Environment Variables

Fixture Mode Control

# Use cached fixtures (default, no API calls)
mix mc

# Record new fixtures (makes live API calls)
REQ_LLM_FIXTURES_MODE=record mix mc
# OR
mix mc --record

Model Selection

# Test all available models
REQ_LLM_MODELS="all" mix mc

# Test all models from a provider
REQ_LLM_MODELS="anthropic:*" mix mc

# Test specific models (comma-separated)
REQ_LLM_MODELS="openai:gpt-4o,anthropic:claude-3-5-sonnet" mix mc

# Sample N models per provider
REQ_LLM_SAMPLE=2 mix mc

# Exclude specific models
REQ_LLM_EXCLUDE="gpt-4o-mini,gpt-3.5-turbo" mix mc

Debug Output

# Verbose fixture debugging
REQ_LLM_DEBUG=1 mix mc

Fixture System Details

Fixture Storage

Fixtures are stored next to test files:

test/coverage/<provider>/fixtures/
 basic.json
 streaming.json
 token_limit.json
 usage.json
 tool_multi.json
 no_tool.json
 object_basic.json
 object_streaming.json
 reasoning_basic.json

Fixture Format

Fixtures capture the complete API response:

{
  "captured_at": "2025-01-15T10:30:00Z",
  "model_spec": "anthropic:claude-3-5-sonnet-20241022",
  "scenario": "basic",
  "result": {
    "ok": true,
    "response": {
      "id": "msg_123",
      "model": "claude-3-5-sonnet-20241022",
      "message": {...},
      "usage": {...}
    }
  }
}

Parallel Execution

The fixture system supports parallel test execution:

  • Tests run concurrently for speed
  • State tracking skips models with passing fixtures
  • Use --record or --record-all to regenerate

Development Workflow

Adding a New Provider

  1. Implement provider module and metadata
  2. Create test file using Comprehensive macro
  3. Record initial fixtures:
    mix mc "<provider>:*" --record
    
  4. Verify all tests pass:
    mix mc "<provider>"
    

Updating Model Coverage

  1. Sync latest model metadata:
    mix req_llm.model_sync
    
  2. Record fixtures for new models:
    mix mc "<provider>:new-model" --record
    
  3. Validate updated coverage:
    mix mc "<provider>"
    

Refreshing Fixtures

Periodically refresh fixtures to catch API changes:

# Refresh specific provider
mix mc "anthropic:*" --record

# Refresh specific capability
REQ_LLM_FIXTURES_MODE=record mix test --only "scenario:streaming"

# Refresh all (expensive, requires all API keys)
mix mc "*:*" --record

Quality Commitments

We guarantee that all "supported models" (those counted in our documentation):

  1. Have passing fixtures for basic functionality
  2. Are tested against live APIs before fixture capture
  3. Pass capability-focused tests for advertised features
  4. Are regularly refreshed to catch provider-side changes

What's Tested

For each supported model:

  • ✅ Text generation (streaming and non-streaming)
  • ✅ Token limits and truncation behavior
  • ✅ Usage metrics and cost calculation
  • ✅ Tool calling (if advertised)
  • ✅ Object generation (if advertised)
  • ✅ Reasoning tokens (if advertised)

What's NOT Guaranteed

  • Complex edge cases beyond basic capabilities
  • Provider-specific features not in model metadata
  • Real-time behavior (fixtures may be cached)
  • Exact API response formats (providers may change)

Troubleshooting

Fixture Mismatch

If tests fail with fixture mismatches:

# Re-record the specific scenario
mix mc "provider:model" --record

Missing API Key

Tests skip if API key is unavailable:

# Set in .env file
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Debugging Fixture Issues

Enable verbose output:

REQ_LLM_DEBUG=1 mix test --only "provider:anthropic" --only "scenario:basic"

Best Practices

  1. Run locally before CI: mix mc before committing
  2. Record incrementally: Don't re-record all fixtures at once
  3. Use samples for development: mix mc --sample for quick validation
  4. Keep fixtures fresh: Refresh fixtures when providers update APIs
  5. Tag tests appropriately: Use semantic tags for precise test selection

Commands Reference

# Validation (using fixtures)
mix mc                          # All models with passing fixtures
mix mc anthropic                # All Anthropic models
mix mc "openai:gpt-4o"          # Specific model
mix mc --sample                 # Sample models per provider
mix mc --available              # List all registry models

# Recording (live API calls)
mix mc --record                 # Re-record passing models
mix mc "xai:*" --record         # Re-record xAI models
mix mc "<provider>:*" --record  # Re-record specific provider

# Environment variables
REQ_LLM_FIXTURES_MODE=record    # Force recording
REQ_LLM_MODELS="pattern"        # Model selection pattern
REQ_LLM_SAMPLE=N                # Sample N per provider
REQ_LLM_EXCLUDE="model1,model2" # Exclude models
REQ_LLM_DEBUG=1                 # Verbose output

Summary

The fixture-based testing system provides:

  • Fast local validation with cached fixtures
  • Comprehensive coverage across capabilities
  • Parallel execution for speed
  • Clear model support guarantees backed by test evidence
  • Easy provider addition with minimal boilerplate

This system is how ReqLLM backs up the claim of "135+ supported models" - each one has fixture evidence of passing comprehensive capability tests.