ReqLLM.Providers.Azure (ReqLLM v1.6.0)

View Source

Azure AI provider implementation.

Supports Azure's AI services for accessing models from multiple families:

OpenAI Models

  • GPT-4o, GPT-4, GPT-3.5 Turbo
  • Reasoning models (o1, o3 series)
  • Text embedding models

Anthropic Claude Models

  • Claude 3 Opus, Sonnet, Haiku
  • Claude 3.5 Sonnet
  • Extended thinking/reasoning support

Capabilities

  • Text generation (chat completions / messages)
  • Streaming responses with usage tracking
  • Tool calling (function calling)
  • Embeddings generation (OpenAI models only)
  • Multi-modal inputs (text and images)
  • Structured output generation
  • Extended thinking (Claude models)

Key Differences from Direct Provider APIs

  1. Custom endpoints: Each Azure resource has a unique base URL. Azure supports two endpoint formats, auto-detected from the domain:

    • Azure OpenAI Service (.cognitiveservices.azure.com or .openai.azure.com): URL: /deployments/{deployment}/chat/completions?api-version={version} Model determined by deployment name in URL path.
  2. API key authentication: Uses api-key header for all model families

  3. Bearer token authentication: Prefix api_key with "Bearer " to use Authorization: Bearer header

  4. Deployment names: The deployment name is used either in the URL path (traditional) or in the request body (Foundry format)

  5. No model field in body: The deployment ID in the URL determines the model

    • Azure AI Foundry (.services.ai.azure.com): URL: /models/chat/completions?api-version={version} Model specified in request body (deployment name used).

Authentication

Environment variables are resolved by model family:

# For OpenAI models (GPT, o1, o3, etc.)
export AZURE_OPENAI_API_KEY=your-api-key
export AZURE_OPENAI_BASE_URL=https://your-openai-resource.openai.azure.com/openai

# For Anthropic models (Claude)
export AZURE_ANTHROPIC_API_KEY=your-api-key
export AZURE_ANTHROPIC_BASE_URL=https://your-anthropic-resource.openai.azure.com/openai

# Universal fallbacks (if all models share the same Azure resource)
export AZURE_API_KEY=your-api-key
export AZURE_BASE_URL=https://your-resource.openai.azure.com/openai

# Or pass directly in options (Azure OpenAI Service format)
ReqLLM.generate_text(
  "azure:gpt-4o",
  "Hello!",
  api_key: "your-api-key",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-gpt4-deployment"
)

# Using Bearer token authentication (e.g., Entra ID / Azure AD tokens)
ReqLLM.generate_text(
  "azure:gpt-4o",
  "Hello!",
  api_key: "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-gpt4-deployment"
)

# Azure AI Foundry format (auto-detected from domain)
ReqLLM.generate_text(
  "azure:deepseek-v3",
  "Hello!",
  api_key: "your-api-key",
  base_url: "https://my-resource.services.ai.azure.com",
  deployment: "deepseek-v3"
)

Examples

# Basic usage
{:ok, response} = ReqLLM.generate_text(
  "azure:gpt-4o",
  "What is Elixir?",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-gpt4-deployment"
)

# Streaming
{:ok, response} = ReqLLM.stream_text(
  "azure:gpt-4o",
  "Tell me a story",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-gpt4-deployment"
)

# With tools
tools = [%ReqLLM.Tool{name: "get_weather", ...}]
{:ok, response} = ReqLLM.generate_text(
  "azure:gpt-4o",
  "What's the weather?",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-gpt4-deployment",
  tools: tools
)

# Embeddings
{:ok, embedding} = ReqLLM.generate_embedding(
  "azure:text-embedding-3-small",
  "Hello world",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-embedding-deployment"
)

# OpenAI reasoning models (o1, o3, o4-mini)
{:ok, response} = ReqLLM.generate_text(
  "azure:o1",
  "Solve this complex math problem step by step...",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-o1-deployment",
  max_tokens: 8000,
  provider_options: [reasoning_effort: "high"]
)

# Claude with extended thinking
{:ok, response} = ReqLLM.generate_text(
  "azure:claude-3-5-sonnet-20241022",
  "Analyze this complex problem...",
  base_url: "https://my-resource.openai.azure.com/openai",
  deployment: "my-claude-deployment",
  thinking: %{type: "enabled", budget_tokens: 10000},
  max_tokens: 4096
)

Deployment Configuration

Azure uses deployment names to route requests to specific model instances. If no deployment is specified, the model ID is used as a default (with a warning).

To find your deployment name:

  1. Go to Azure OpenAI Studio (https://oai.azure.com/)
  2. Navigate to "Deployments"
  3. Copy the deployment name (e.g., "gpt-4o-prod", "claude-sonnet")

Error Handling

Common error scenarios:

  • Missing API key: Set AZURE_API_KEY (or family-specific: AZURE_OPENAI_API_KEY, AZURE_ANTHROPIC_API_KEY)
  • Missing base URL: Set AZURE_BASE_URL (or family-specific: AZURE_OPENAI_BASE_URL, AZURE_ANTHROPIC_BASE_URL)
  • Invalid deployment: Ensure the deployment name matches your Azure resource
  • Unsupported API version: Check Azure documentation for supported versions

Extending for New Model Families

Azure hosts multiple model families (OpenAI GPT, Anthropic Claude). To add support for a new model family:

  1. Create a formatter module under ReqLLM.Providers.Azure.* (see Azure.OpenAI or Azure.Anthropic as examples)
  2. Add the model prefix to @model_families map in this module
  3. Handle any family-specific endpoint paths in get_chat_endpoint_path/3
  4. Add family-specific headers in get_anthropic_headers/2 if needed

Summary

Functions

Attaches Azure-specific authentication and pipeline steps to a request.

Builds a Finch request for streaming responses.

Default implementation of build_body/1.

Checks if an error indicates missing Azure credentials.

Decodes Azure API responses using the appropriate model-family formatter.

Decodes Server-Sent Events for streaming responses.

Pass-through encoding - body is pre-encoded by formatters in prepare_request.

Extracts usage/token information from API responses.

Pre-validates and transforms options before request building.

Prepares a request for Azure AI services.

Returns thinking constraints for extended thinking support.

Translates ReqLLM options to provider-specific format.

Functions

attach(request, model_input, user_opts)

Attaches Azure-specific authentication and pipeline steps to a request.

Authentication is determined by the api_key format:

  • If api_key starts with "Bearer ", uses Authorization: Bearer header
  • Otherwise, uses api-key header for OpenAI models, x-api-key for Claude

Also adds model-family specific headers (e.g., anthropic-version for Claude models).

attach_stream(model, context, opts, finch_name)

Builds a Finch request for streaming responses.

Constructs the appropriate endpoint URL based on model family and adds Azure-specific headers (api-key, anthropic-version for Claude).

base_url()

build_body(request)

Default implementation of build_body/1.

Builds request body using OpenAI-compatible format for chat and embedding operations.

credential_missing?(arg1)

Checks if an error indicates missing Azure credentials.

Returns true if the error message mentions AZURE_OPENAI_API_KEY or api_key.

decode_response(request_response)

Decodes Azure API responses using the appropriate model-family formatter.

Routes to Azure.OpenAI.parse_response/3 or Azure.Anthropic.parse_response/3 based on the model. Handles both successful responses and error extraction.

decode_stream_event(event, model)

Decodes Server-Sent Events for streaming responses.

Delegates to the appropriate model-family formatter for SSE parsing.

default_base_url()

default_env_key()

Callback implementation for ReqLLM.Provider.default_env_key/0.

encode_body(request)

Pass-through encoding - body is pre-encoded by formatters in prepare_request.

This follows the same pattern as Amazon Bedrock where the model-family-specific formatter handles body encoding during request preparation.

extract_usage(body, model)

Extracts usage/token information from API responses.

Delegates to the model-family formatter for provider-specific usage extraction.

pre_validate_options(operation, model, opts)

Pre-validates and transforms options before request building.

Delegates to the model-specific formatter (Azure.OpenAI or Azure.Anthropic). This handles model-specific requirements like reasoning parameter translation.

Note: This is not yet a formal Provider callback but is called by Options.process/4 if the provider exports it.

prepare_request(operation, model_spec, input, opts)

Prepares a request for Azure AI services.

Routes to the appropriate formatter (OpenAI or Anthropic) based on model family.

Operations

  • :chat - Text generation via chat completions or messages endpoint
  • :object - Structured output generation (uses tools for OpenAI, native for Claude)
  • :embedding - Vector embeddings (OpenAI embedding models only)

provider_extended_generation_schema()

provider_id()

provider_schema()

supported_provider_options()

thinking_constraints()

Returns thinking constraints for extended thinking support.

Azure hosts both OpenAI and Anthropic models with different constraints:

  • Claude models require temperature=1.0 for extended thinking (enforced in Azure.Anthropic.pre_validate_options/3)
  • OpenAI reasoning models (o1, o3, o4) use reasoning_effort parameter, not the extended thinking protocol

Returns :none since there are no universal constraints that apply to all Azure models. Model-family-specific constraints are enforced in the respective formatter modules during pre_validate_options.

translate_options(operation, model, opts)

Translates ReqLLM options to provider-specific format.

Delegates to OpenAI.translate_options/3 for GPT models or Anthropic.translate_options/3 for Claude models to handle model-specific parameter requirements.