LLM Providers
View SourceThe Rag library supports multiple LLM providers through a unified interface, enabling flexible provider selection and failover.
Available Providers
| Provider | Module | Embeddings | Tools | Streaming | Max Context |
|---|---|---|---|---|---|
| Gemini | Rag.Ai.Gemini | Yes | Yes | Yes | 1M tokens |
| Claude | Rag.Ai.Claude | No | Yes | Yes | 200K tokens |
| Codex | Rag.Ai.Codex | No | Yes | Yes | 128K tokens |
| OpenAI | Rag.Ai.OpenAI | Yes | Yes | Yes | Model-dependent |
| Ollama | Rag.Ai.Ollama | Yes | Yes | Yes | Model-dependent |
| Cohere | Rag.Ai.Cohere | Yes | Yes | Yes | Model-dependent |
| Nx | Rag.Ai.Nx | Yes | No | Config | Local |
Provider Behaviour
All providers implement the Rag.Ai.Provider behaviour:
@callback new(attrs :: map()) :: struct()
@callback generate_embeddings(provider, texts, opts) :: {:ok, list(embedding())} | {:error, any()}
@callback generate_text(provider, prompt, opts) :: {:ok, response()} | {:error, any()}Configuration
Environment Variables
# Gemini (recommended for embeddings)
export GEMINI_API_KEY="your-api-key"
# Claude (best for analysis)
export ANTHROPIC_API_KEY="your-api-key"
# OpenAI/Codex (best for code)
export OPENAI_API_KEY="your-api-key"
# or
export CODEX_API_KEY="your-api-key"
Gemini Provider
The default provider with full embedding support.
Models are resolved through Gemini.Config, so you can pass alias keys
(e.g., :flash_lite_latest) or omit :model to use auth-aware defaults.
Optional app-wide defaults:
config :rag, Rag.Ai.Gemini,
model: Gemini.Config.default_model(),
embedding_model: Gemini.Config.default_embedding_model()Usage
alias Rag.Ai.Gemini
alias Gemini.Config, as: GeminiConfig
# Create provider instance
provider = Gemini.new(%{model: :flash_lite_latest})
# Text generation
{:ok, response} = Gemini.generate_text(provider, "Hello!", [])
# Streaming
{:ok, stream} = Gemini.generate_text(provider, "Hello!", stream: true)
Enum.each(stream, &IO.write/1)
# Embeddings
{:ok, embeddings} = Gemini.generate_embeddings(provider, ["text1", "text2"], [])Options
# Text generation options
[
stream: false, # Enable streaming
temperature: 0.7, # Randomness (0.0-2.0)
max_tokens: 1024, # Max output tokens
top_p: 0.9, # Nucleus sampling
top_k: 40 # Top-K sampling
]
# Embedding options
[
task_type: :retrieval_document, # or :retrieval_query
model: GeminiConfig.default_embedding_model() # Auth-aware default
]Capabilities
Gemini.supports_tools?() # true
Gemini.supports_embeddings?() # true
Gemini.max_context_tokens() # 1_000_000
Gemini.cost_per_1k_tokens() # {0.000075, 0.000300}Claude Provider
Best for analysis, reasoning, and agentic workflows.
Usage
alias Rag.Ai.Claude
provider = Claude.new(%{
model: "claude-sonnet-4-20250514",
max_turns: 10
})
# Text generation
{:ok, response} = Claude.generate_text(provider, "Analyze this code", [])
# With system prompt
{:ok, response} = Claude.generate_text(provider, "Hello",
system_prompt: "You are a helpful assistant."
)
# Embeddings NOT supported
{:error, :not_supported} = Claude.generate_embeddings(provider, ["text"], [])Options
[
stream: false, # Enable streaming
system_prompt: "You are...", # System instruction
output_format: :text # Output format
]Codex Provider (OpenAI-compatible)
Best for code generation and structured output.
Usage
alias Rag.Ai.Codex
provider = Codex.new(%{
model: "gpt-4o",
reasoning_effort: :medium # :low, :medium, or :high
})
# Text generation
{:ok, response} = Codex.generate_text(provider, "Write a function", [])
# With structured output
{:ok, response} = Codex.generate_text(provider, "Generate JSON",
output_schema: %{type: "object", properties: %{...}}
)OpenAI Provider (Direct HTTP)
Alternative OpenAI implementation without SDK.
Usage
alias Rag.Ai.OpenAI
provider = OpenAI.new(%{
embeddings_url: "https://api.openai.com/v1/embeddings",
embeddings_model: "text-embedding-3-small",
text_url: "https://api.openai.com/v1/chat/completions",
text_model: "gpt-4o",
api_key: System.get_env("OPENAI_API_KEY")
})
{:ok, embeddings} = OpenAI.generate_embeddings(provider, ["text"], [])
{:ok, response} = OpenAI.generate_text(provider, "Hello", [])Ollama Provider (Local)
For self-hosted local models.
Usage
alias Rag.Ai.Ollama
provider = Ollama.new(%{
embeddings_url: "http://localhost:11434/api/embed",
embeddings_model: "nomic-embed-text",
text_url: "http://localhost:11434/api/chat",
text_model: "llama2"
})
{:ok, embeddings} = Ollama.generate_embeddings(provider, ["text"], [])
{:ok, response} = Ollama.generate_text(provider, "Hello", [])Nx Provider (On-Device)
For local inference using Bumblebee models.
Usage
alias Rag.Ai.Nx
# Must pre-configure Nx.Serving instances
provider = Nx.new(%{
embeddings_serving: embedding_serving, # from Bumblebee
text_serving: text_serving
})
{:ok, embeddings} = Nx.generate_embeddings(provider, ["text"], [])Capabilities Module
Query provider capabilities at runtime:
alias Rag.Ai.Capabilities
# Get all providers
Capabilities.all()
# %{gemini: %{embeddings: true, ...}, claude: %{...}, codex: %{...}}
# Get available providers (with valid credentials)
Capabilities.available()
# Check specific capability
Capabilities.can_handle?(:gemini, :embeddings) # true
Capabilities.can_handle?(:claude, :embeddings) # false
# Get providers with capability
Capabilities.with_capability(:embeddings)
# [{:gemini, %{...}}]
# Best provider for task
Capabilities.best_for(:embeddings) # :gemini
Capabilities.best_for(:code_generation) # :codex
Capabilities.best_for(:analysis) # :claude
Capabilities.best_for(:long_context) # :geminiTask Mappings
| Task | Best Provider | Reason |
|---|---|---|
:embeddings | Gemini | Only provider with embedding support |
:long_context | Gemini | 1M token context window |
:multimodal | Gemini | Image/audio support |
:cost | Gemini | Most cost-effective |
:speed | Gemini | Fastest inference |
:code_generation | Codex | Optimized for code |
:structured_output | Codex | Best JSON generation |
:analysis | Claude | Deep reasoning |
:writing | Claude | Best prose quality |
:agentic | Claude | Multi-step workflows |
:reasoning | Claude | Complex logic |
:safety | Claude | Strongest safety |
Streaming Responses
All major providers support streaming:
{:ok, stream} = Router.execute(router, :text, "Count to 10", stream: true)
# Consume stream
Enum.each(stream, fn chunk ->
IO.write(chunk)
end)Cost Comparison
| Provider | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini | $0.075 | $0.30 |
| Claude | $3.00 | $15.00 |
| Codex/GPT-4o | $2.50 | $10.00 |
Best Practices
- Use Gemini for embeddings - It's the only provider with native embedding support
- Use Claude for analysis - Best reasoning capabilities
- Use Codex for code - Optimized for code generation
- Configure fallback - Use Router with multiple providers for reliability
- Check capabilities first - Use
Capabilities.can_handle?/2before calling
Next Steps
- Smart Router - Learn about routing strategies
- Embeddings - Deep dive into embedding generation