Smart Router

View Source

The Router provides intelligent multi-provider orchestration with automatic failover, load balancing, and task-based routing.

Overview

The Router selects the best LLM provider for each request based on configurable strategies, handles failures gracefully, and tracks provider health.

Creating a Router

alias Rag.Router

# With specific providers
{:ok, router} = Router.new(providers: [:gemini, :claude, :codex])

# Auto-detect available providers
{:ok, router} = Router.new(auto_detect: true)

# With specific strategy
{:ok, router} = Router.new(
  providers: [:gemini, :claude],
  strategy: :fallback
)

Auto-Strategy Selection

When strategy: :auto or not specified:

Provider CountStrategyReason
3+ providers:specialistTask-based routing
2 providers:fallbackReliability via retry
1 provider:fallbackPassthrough

Core API

Execute Request

# Text generation
{:ok, response, router} = Router.execute(router, :text, "Hello", [])

# With options
{:ok, response, router} = Router.execute(router, :text, "Hello",
  system_prompt: "You are helpful.",
  temperature: 0.7
)

# Embeddings
{:ok, embeddings, router} = Router.execute(router, :embeddings, ["text1", "text2"], [])

# Streaming
{:ok, stream, router} = Router.execute(router, :text, "Count to 10", stream: true)
Enum.each(stream, &IO.write/1)

Route Only (No Execution)

# Get selected provider without executing
{:ok, provider, router} = Router.route(router, :text, "Hello", [])
# provider is :gemini, :claude, or :codex

Query Providers

# Available providers
Router.available_providers(router)  # [:gemini, :claude]

# Get provider capabilities
{:ok, caps} = Router.get_provider(router, :gemini)
# %{embeddings: true, streaming: true, max_context: 1_000_000, ...}

Routing Strategies

Fallback Strategy

Tries providers in order until one succeeds.

{:ok, router} = Router.new(
  providers: [:gemini, :claude, :codex],
  strategy: :fallback,
  fallback_order: [:claude, :gemini, :codex],  # Custom order
  max_failures: 3,                              # Skip after 3 failures
  failure_decay_ms: 60_000                      # Reset after 60s
)

Behavior:

  1. Try first provider in order
  2. On failure, mark and try next
  3. Skip providers with >= max_failures recent failures
  4. Automatically recover after failure_decay_ms
  5. Success resets failure count

Configuration:

  • fallback_order - Custom provider order (default: providers list order)
  • max_failures - Failures before skipping (default: 3)
  • failure_decay_ms - Time to reset failures (default: 60,000ms)

Round-Robin Strategy

Distributes load evenly across providers.

{:ok, router} = Router.new(
  providers: [:gemini, :claude, :codex],
  strategy: :round_robin,
  weights: %{gemini: 3, codex: 2, claude: 1},  # Optional weighted
  max_consecutive_failures: 3,
  recovery_cooldown_ms: 30_000
)

Behavior:

  1. Rotate through providers
  2. With weights: {gemini: 2, codex: 1} produces gemini, gemini, codex, gemini, gemini, codex...
  3. Skip unavailable providers
  4. Mark unavailable after consecutive failures
  5. Recover after cooldown period

Configuration:

  • weights - Provider weights (default: equal)
  • max_consecutive_failures - Before marking unavailable (default: 3)
  • recovery_cooldown_ms - Recovery time (default: 30,000ms)

Specialist Strategy

Routes based on task type to best-suited provider.

{:ok, router} = Router.new(
  providers: [:gemini, :claude, :codex],
  strategy: :specialist,
  task_mappings: %{
    embeddings: :gemini,
    code_generation: :codex,
    analysis: :claude
  },
  fallback_provider: :gemini,
  max_failures: 3
)

Default Task Mappings:

TaskProviderReason
:embeddingsGeminiEmbedding support
:long_contextGemini1M token window
:multimodalGeminiImage/audio
:costGeminiCheapest
:speedGeminiFastest
:code_generationCodexCode optimized
:code_reviewCodexCode understanding
:structured_outputCodexJSON generation
:analysisClaudeDeep reasoning
:writingClaudeBest prose
:agenticClaudeMulti-step
:reasoningClaudeComplex logic
:safetyClaudeSafety focus

Task Inference:

  • Explicit: Router.execute(router, :text, prompt, task: :code_generation)
  • Automatic: Infers from prompt keywords
    • Code keywords: "write", "implement", "function", "code", "class"
    • Analysis keywords: "analyze", "explain", "review", "compare"

Configuration:

  • task_mappings - Task to provider mapping
  • fallback_provider - Default if preferred unavailable
  • max_failures - Before marking unavailable

Error Handling

case Router.execute(router, :text, "Hello", []) do
  {:ok, response, updated_router} ->
    # Success - use updated_router for subsequent calls
    IO.puts(response)

  {:error, :all_providers_failed} ->
    # All providers failed
    IO.puts("No providers available")

  {:error, reason} ->
    # Other error
    IO.puts("Error: #{inspect(reason)}")
end

Manual Result Reporting

# Report success/failure for custom execution
router = Router.report_result(router, :gemini, {:ok, "response"})
router = Router.report_result(router, :gemini, {:error, :timeout})

# Get next provider after failure
{:ok, next_provider, router} = Router.next_provider(router, :gemini)

Execution Flow

User Request
    |
    v
Router.execute()
    |
    v
Strategy.select_provider()
    |
    v
Get/Create Provider Instance
    |
    v
Provider.generate_text() or .generate_embeddings()
    |
    +---> Success: return {:ok, result, router}
    |
    +---> Failure: report_result() -> next_provider() -> retry
              |
              +--> All exhausted: {:error, :all_providers_failed}

Health Tracking

Each strategy tracks provider health differently:

Fallback

  • Counts consecutive failures per provider
  • Skips provider when count >= max_failures
  • Resets count after failure_decay_ms or on success

Round-Robin

  • Counts consecutive failures per provider
  • Marks unavailable when count >= max_consecutive_failures
  • Recovers after recovery_cooldown_ms
  • Resets count on success

Specialist

  • Counts total failures per provider
  • Marks unavailable when count >= max_failures
  • Falls back to fallback_provider
  • Resets count on success

Provider Instance Caching

The router caches provider instances:

# First use creates instance
{:ok, response, router} = Router.execute(router, :text, "Hello", [])

# Subsequent calls reuse cached instance
{:ok, response, router} = Router.execute(router, :text, "World", [])

Best Practices

  1. Use fallback for reliability - When uptime is critical
  2. Use round-robin for load balancing - When distributing load matters
  3. Use specialist for optimization - When matching task to provider matters
  4. Always use updated router - Router state changes after each call
  5. Handle all_providers_failed - Have a fallback plan
  6. Configure timeouts - Prevent hanging on slow providers

Example: Complete Setup

alias Rag.Router

# Configure with all strategies
{:ok, router} = Router.new(
  providers: [:gemini, :claude, :codex],
  strategy: :specialist,
  task_mappings: %{
    embeddings: :gemini,
    code_generation: :codex,
    analysis: :claude,
    general: :gemini
  },
  fallback_provider: :gemini,
  max_failures: 3
)

# Embeddings go to Gemini
{:ok, embeddings, router} = Router.execute(router, :embeddings, ["text"], [])

# Code tasks go to Codex
{:ok, code, router} = Router.execute(router, :text,
  "Write a fibonacci function",
  task: :code_generation
)

# Analysis goes to Claude
{:ok, analysis, router} = Router.execute(router, :text,
  "Analyze this architecture decision",
  task: :analysis
)

Next Steps