Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

1.4.0 - 2026-01-30

Added

Reasoning cost breakdown with reasoning_cost field in cost calculations (#394)
OpenRouter enhancements (#393, #374)
- openrouter_usage and openrouter_plugins provider options
- Native JSON schema structured output support
Google URL context provider option for grounding (#392)
Google auth header option for streaming requests (#382)
Cohere Embeddings on Bedrock support (#365)
Structured and multimodal tool outputs (#357)
OpenAI strict mode for JSON schema validation (#368)
OpenAI verbosity support for reasoning models (#354)
Comprehensive usage and billing infrastructure (#371)
Ollama usage guide documentation (#387)
Model base_url override in model configuration (#366)
Anthropic web search tool support for real-time web content access
- web_search provider option enables Claude to search the web during conversations
- Configurable options: max_uses, allowed_domains, blocked_domains, user_location
- Automatic source citations in responses
- Works with all supported Claude models (Sonnet 4.5, Sonnet 4, Haiku 4.5, Haiku 3.5, Opus 4.5, Opus 4.1, Opus 4)
- Can be combined with regular function tools
- Pricing: $10 per 1,000 searches plus standard token costs
- Example: provider_options: [web_search: %{max_uses: 5, allowed_domains: ["wikipedia.org"]}]
ReqLLM.ModelHelpers capability helper functions for type-safe model capability access
- Centralized helpers like ReqLLM.ModelHelpers.json_schema?/1, ReqLLM.ModelHelpers.tools_strict?/1
- Replaces scattered get_in(model.capabilities, ...) calls across providers
- Fixes bug in Bedrock provider where reasoning capability was checked incorrectly (was checking capabilities.reasoning map instead of capabilities.reasoning.enabled boolean)
- Provides single source of truth for capability access patterns
- Compile-time generated functions from capability schema paths
Amazon Bedrock service tier support for request prioritization
- service_tier option with values: "priority", "default", "flex"
- Priority tier provides faster responses at premium cost for mission-critical workloads
- Flex tier offers cost-effective processing for non-urgent tasks
- Supported on compatible Bedrock models (check AWS documentation for availability)
Google Vertex AI Gemini model support
- Gemini 2.0 Flash, 2.5 Flash, 2.5 Flash Lite, and 2.5 Pro on Google Vertex AI
- Delegates to native Google provider format with Vertex-specific quirks handled
- Sanitizes function call IDs (Vertex API rejects them while direct Google API includes them)
- Full support for extended thinking/reasoning, context caching, and all Gemini features
- Complete fixture coverage for all Vertex Gemini models (46 fixtures: 10 for 2.0, 12 each for 2.5 variants)
Google Context Caching for Gemini models with up to 90% cost savings
- ReqLLM.Providers.Google.CachedContent module for cache CRUD operations
- Create, list, update, and delete cached content
- Support for both Google AI Studio and Vertex AI (requires Gemini models)
- cached_content provider option to reference existing caches
- Minimum token requirements: 1,024 (Flash) / 4,096 (Pro)
OAuth2 token caching for Google Vertex AI
- Eliminates 60-180ms auth overhead on every request
- Tokens cached for 55 minutes (5 minute safety margin before 1 hour expiry)
- GenServer serializes concurrent refresh requests to prevent duplicate fetches
- Per-node cache (no distributed coordination needed)
- 99.9% reduction in auth overhead for typical workloads
Real-time stream processing with ReqLLM.StreamResponse.process_stream/2
- Process streams incrementally with real-time callbacks
- on_result callback for content chunks (fires immediately as text arrives)
- on_thinking callback for reasoning/thinking chunks (fires immediately)
- Prevents double-stream consumption bugs through single-pass processing
- Enables real-time streaming to UIs (Phoenix LiveView, websockets, etc.)
- No upfront Enum.to_list - callbacks fire as chunks arrive from the stream
Provider alias support via llm_db integration
- Google Vertex AI Anthropic models now accessible via :google_vertex provider
- google_vertex_anthropic provider aliased to google_vertex implementation
- Enables single provider module to serve models from multiple llm_db providers
- Complete fixture coverage for all Vertex Claude models (36 fixtures: 12 per model × 3 models)
provider_model_id support for AWS Bedrock inference profiles
- Models can specify API-specific identifiers separate from canonical IDs
- Enables Bedrock streaming and on-demand throughput with inference profile prefixes
- Applied to Claude Haiku 4.5, Sonnet 4.5, Opus 4.1, Llama 3.3 70B models
Credential fallback for fixture recording in providers requiring cloud credentials
- Automatic fallback to existing fixtures when credentials are missing during RECORD mode
- Provider-specific credential detection via optional credential_missing?/1 callback
- Implemented in AWS Bedrock, Google, and Google Vertex AI providers
- Enables comprehensive test coverage without requiring all developers to configure cloud credentials

Changed

Zoi schema refactor for data structures, replacing TypedStruct (#376)
AWS Event Stream parser documentation clarifying Bedrock specialization
- Explains performance rationale for single-pass parsing and header specialization
- Documents non-goals (S3 Select, Transcribe, Kinesis incompatibility)
Object generation detection updated to recognize tool-based workaround
- supports_object_generation? now accepts models with tools.enabled = true
- Enables object generation tests for Vertex Claude models using tool workaround

Fixed

Image-only attachments validation for OpenAI and xAI (#389)
translate_options changes now preserved in provider_options (#381)
StreamServer termination handled gracefully in FinchClient (#379)
Anthropic schema constraints stripped when unsupported (#378)
api_key added to internal keys preventing leakage (#355)
Test helper tool_budget_for/1 pattern match regression from LLMDB integration
- Fixed pattern match to use {:ok, model} instead of obsolete {:ok, {provider, id, model}}
- Fixed field name from model.limit to model.limits
- Regression introduced in v1.1.0 caused test fixtures to use incorrect maxOutputTokens values
- Primarily affected reasoning-enabled models (Gemini 2.5 Pro) where 150 token default was insufficient
Google provider cached token extraction from API responses
- Extracts cachedContentTokenCount from usageMetadata for both implicit and explicit caching
- Converts to OpenAI-compatible prompt_tokens_details.cached_tokens format
- Fixes cached tokens always showing as 0 even when caching was active
- Affects both google and google-vertex providers using Gemini models
JSV schema validation now preserves original data types instead of returning cast values
- Prevents unwanted type coercion (e.g., 1.0 → 1 for integer schemas)
- Validation still enforces schema constraints, but returns original input data
JSV schema compilation performance improved with ETS-based caching
- Compiled schemas cached globally to avoid redundant JSV.build!/1 calls
- Configured with read_concurrency for fast concurrent access
Google Vertex AI provider guide missing from documentation
- Added google_vertex.md to mix.exs extras and Providers group

1.0.0 - 2025-11-02

Added

Google Vertex AI provider with comprehensive Claude 4.x support
- OAuth2 authentication with service accounts
- Full Claude model support (Haiku 4.5, Sonnet 4.5, Opus 4.1)
- Extended thinking and prompt caching capabilities
- Complete fixtures for all Vertex AI Claude models
AWS Bedrock inference profile models with complete fixture coverage
- Anthropic Claude inference profiles (Haiku 4.5, Sonnet 4.5, Opus 4.1)
- OpenAI OSS models (gpt-oss-20b, gpt-oss-120b)
- Meta Llama inference profiles
- Cohere Command R and Command R Plus models
Provider base URL override capability via application config
- Enables testing with mock services
- Configured per-provider in application config
AWS Bedrock API key authentication support (introduced by AWS in July 2025)
- Simple Bearer token authentication as alternative to IAM credentials
- api_key provider option with AWS_BEARER_TOKEN_BEDROCK environment variable fallback
- Short-term keys (up to 12 hours) recommended for production
- Long-term keys available for exploration
- Limitations: Cannot use with InvokeModelWithBidirectionalStream, Agents, or Data Automation
Context tools persistence for AWS Bedrock multi-turn conversations
- Tools automatically persist in context after first request
- Bedrock-specific implementation with zero impact on other providers
Schema map-subtyped list support for complex nested structures
- Properly handles {:list, {:map, schema}} type definitions
- Generates correct JSON Schema for nested object arrays

Enhanced

Google provider v1beta API as default version
- Fixes streaming compatibility issues
- Updated all test fixtures to use v1beta
Test configuration expanded with additional LLM providers
- Enhanced catalog_allow settings for broader provider coverage
Documentation organization with refactored guides structure
- Improved provider-specific documentation
- Better task organization in mix.exs

Fixed

Streaming protocol callback renamed from decode_sse_event to decode_stream_event
- More protocol-agnostic naming (supports SSE, AWS Event Stream, etc.)
- Affects all providers implementing streaming
Groq UTF-8 boundary handling in streaming responses
- Prevents crashes when UTF-8 characters split across chunk boundaries
Schema boolean encoding preventing invalid string coercion
- Boolean values now correctly encoded in normalized schemas
OpenAI model list typo corrected in documentation
AWS Bedrock Anthropic inference profile model ID preservation
- Added preserve_inference_profile?/1 callback to Anthropic Bedrock formatter
- Ensures region prefixes (global., us.) are preserved in API requests
- Fixes 400 "invalid model identifier" errors for inference profile models
AWS Bedrock Converse API usage field parsing
- Fixed parse_usage/1 to include all required fields (reasoning_tokens, total_tokens, cached_tokens)
- Fixes KeyError when accessing usage fields from Converse API responses
AWS Bedrock model ID normalization for metadata lookup
- Fixed normalize_model_id/1 to always strip region prefixes for Registry lookups
- Enables capabilities detection for inference profile models
- Separates metadata lookup (always normalized) from API requests (preserve_inference_profile? controls)
AWS Bedrock provider model family support
- Added Meta to @model_families for Llama models using Converse API
- Added OpenAI to @model_families for gpt-oss models
- Cohere Command R models use Converse API directly with full tool support (no custom formatter needed)

Notes

This is the first stable 1.0 release of ReqLLM, marking production readiness with comprehensive provider support, robust streaming, and extensive test coverage. The library now supports 15+ providers with 750+ models and includes advanced features like prompt caching, structured output, tool calling, and embeddings.

1.0.0-rc.8 - 2025-10-29

Added

Prompt caching support for Bedrock Anthropic models (Claude on AWS Bedrock)
- Auto-switches to native API when caching enabled with tools for full cache control
- Supports caching of system prompts and tools
- Provides warning when auto-switching (silenceable with explicit use_converse setting)
Structured output (:object operation) support for AWS Bedrock provider
- Bedrock Anthropic sub-provider using tool-calling approach
- Bedrock Converse API for unified structured output across all models
- Bedrock OpenAI sub-provider (gpt-oss models)
Google Search grounding support for Google Gemini models via built-in tools
- New google_grounding option to enable web search during generation
- API versioning support (v1 and v1beta) for Google provider
- Grounding metadata included in responses when available
JSON Schema validation using JSV library (supports draft 2020-12 and draft 7)
- Client-side schema validation before sending to providers
- Better error messages for invalid schemas (e.g., embedded JSON strings vs maps)
- Validates raw JSON schemas via ReqLLM.Schema.validate/2
Model catalog feature for runtime model discovery
Configurable metadata_timeout option for long-running streams (default: 60s)
- Application-level configuration support
- Fixes metadata collection timeout errors on large documents
HTTP streaming in StreamServer with improved lifecycle management
Direct JSON schema pass-through support for complex object generation
Base URL override capability for testing with mock services
API key option in provider defaults with proper precedence handling
task_type parameter support for Google embeddings

Enhanced

Bedrock provider with comprehensive fixes and improvements
- Streaming temperature/top_p conflict resolution via Options.process pipeline
- Extended thinking (reasoning) support with proper reasoning_effort translation
- Tool round-trip conversations by extracting stub tools from messages
- Complete usage metadata fields (cached_tokens, reasoning_tokens) for all models
- Increased receive timeout from 30s to 60s for large responses
- Unified streaming and non-streaming to use Options.process pipeline
- Uses model capabilities instead of hardcoded model IDs for reasoning support detection
Meta/Llama support refactored into reusable generic provider
- Created ReqLLM.Providers.Meta for Meta's native prompt format
- Bedrock Meta now delegates to generic provider for format conversion
- Enables future Azure AI Foundry and Vertex AI support
- Documents that most providers (Azure, Vertex AI, vLLM, Ollama) use OpenAI-compatible APIs
OpenAI provider with JSON Schema response format support for GPT-5 models
Streaming error handling with HTTP status code validation
- Proper error propagation for 4xx/5xx responses
- Prevents error JSON from being passed to SSE parser
Model metadata tests with improved field mapping validation
Documentation across provider guides and API references

Fixed

Bedrock streaming binary protocol (AWS Event Stream) encoding in fixtures
- Removed redundant "decoded" field that caused Jason.EncodeError
- Fixtures now only store "b64" field for binary protocols (contains invalid UTF-8)
Bedrock thinking parameter removal for forced tool_choice scenarios
- Extended thinking incompatible with object generation fixed via post-processing
- Thinking parameter correctly removed when incompatible with forced tool_choice
Bedrock tool round-trip conversations now work correctly
- Extracts stub tools from messages when tools required but not provided
- Bedrock requires tools definition even for multi-turn tool conversations
- Supports both ReqLLM.Tool structs and minimal stub tools for validation
Bedrock usage metrics now include all required fields
- Meta Llama models provide complete usage data (cached_tokens, reasoning_tokens)
- OpenAI OSS models provide complete usage data
Model compatibility task now uses normalize_model_id callback for registry lookups
- Fixes inference profile ID recognition (e.g., global.anthropic.claude-sonnet-4-5)
Missing :compiled_schema in object streaming options (KeyError fix across all providers)
Nil tool names in streaming deltas now properly guarded
Tool.Inspect protocol crash when inspecting tools with JSON Schema (map) parameter schemas
HTTP/2 flow control bug with large request bodies (>64KB)
- Changed default Finch pool from [:http2, :http1] to [:http1]
- Added validation to prevent HTTP/2 with large payloads
ArgumentError when retry function returns {:delay, ms} (Req 0.5.15+ compatibility)
- Removed conflicting retry_delay option from ReqLLM.Step.Retry.attach/1
Validation errors now use correct Error struct fields (reason vs errors)
Dialyzer type mismatches in decode_response/2

Changed

Removed JidoKeys dependency, simplified to dotenvy for .env file loading
- API keys now loaded from .env files at startup
- Precedence: runtime options > application config > system environment
Upgraded dependencies:
- ex_aws_auth from ~> 1.0 to ~> 1.3
- ex_doc from 0.38.4 to 0.39.1
- zoi from 0.7.4 to 0.8.1
- credo to 1.7.13
Refactored Bedrock provider to use modern ex_aws_auth features
- Migrated to AWSAuth.Credentials struct for credential management
- Replaced manual request signing with AWSAuth.Req plugin (removed ~40 lines of code)
- Updated Finch streaming to use credential-based signing API
- Session tokens now handled automatically by ex_aws_auth
- Simplified STS AssumeRole implementation using credential-based API
Comprehensive test timeout increased from 180s to 300s for slow models (e.g., Claude Opus 4.1)
Formatter line length standardized to 98 characters
Quokka dependency pinned to specific version (2.11.2)

Removed

Outdated test fixtures for deprecated models (Claude 3.5 Sonnet variants, OpenAI o1/o3/o4 variants)
Over 85,000 lines of stale fixture data cleaned up

Infrastructure

CI workflow updates for Elixir 1.18/1.19 on OTP 27/28
Enhanced GitHub Actions configuration with explicit version matrix
Added hex.pm best practices (changelog link, module grouping)
Improved documentation organization with provider-specific guides
Added Claude Opus 4.1 (us.anthropic.claude-opus-4-1-20250805-v1:0) to ModelMatrix

1.0.0-rc.7 - 2025-10-16

Changed

Updated Elixir compatibility to support 1.19
Replaced aws_auth GitHub dependency with ex_aws_auth from Hex for Hex publishing compatibility
Enhanced Dialyzer configuration with ignore_warnings option
Refactored request struct creation across providers using Req.new/2

Added

Provider normalize_model_id/1 callback for model identifier normalization
Amazon Bedrock support for inference profiles with region prefix stripping
ToolCall helper functions: function_name/1, json_arguments/1, arguments/1, find_args/2
New model definitions for Alibaba, Fireworks AI, GitHub Models, Moonshot AI, and Zhipu AI
Claude Haiku 4.5 model entries across multiple providers

Refactored

Removed normalization layer for tool calls, using ReqLLM.ToolCall structs directly
Simplified tool call extraction using find_args/2 across provider modules

1.0.0-rc.6 - 2025-02-15

Added

AWS Bedrock provider with streaming support and multi-model capabilities
- Anthropic Claude models with native API delegation
- OpenAI OSS models (gpt-oss-120b, gpt-oss-20b)
- Meta Llama models with native prompt formatting
- AWS Event Stream binary protocol parser
- AWS Signature V4 authentication (OTP 27 compatible)
- Converse API for unified tool calling across all Bedrock models
- AWS STS AssumeRole support for temporary credentials
- Extended thinking support via additionalModelRequestFields
- Cross-region inference profiles (global prefix)
Z.AI provider with standard and coding endpoints
- GLM-4.5, GLM-4.5-air, GLM-4.5-flash models (131K context)
- GLM-4.6 (204K context, improved reasoning)
- GLM-4.5v (vision model with image/video support)
- Tool calling and reasoning capabilities
- Separate endpoints for general chat and coding tasks
ToolCall struct for standardized tool call representation
Context.append/2 and Context.prepend/2 methods replacing push_* methods
Comprehensive example scripts (embeddings, context reuse, reasoning tokens, multimodal)
StreamServer support for raw fixture generation and reasoning token tracking

Enhanced

Google provider with native responseSchema for structured output
Google file/video attachment support with OpenAI-formatted data URIs
XAI provider with improved structured output test coverage
OpenRouter and Google model fixture coverage
Model compatibility task with migrate and failed_only options
Context handling to align with OpenAI's tool_calls API format
Tool result encoding for multi-turn conversations across all providers
max_tokens extraction from Model.new/3 to respect model defaults
Error handling for metadata-only providers with structured Splode errors
Provider implementations to delegate to shared helper functions

Fixed

get_provider/1 returning {:ok, nil} for metadata-only providers
Anthropic tool result encoding for multi-turn conversations (transform :tool role to :user)
Google structured output using native responseSchema without additionalProperties
Z.AI provider timeout and reasoning token handling
max_tokens not being respected from Model.new/3 across providers
File/video attachment support in Google provider (regression from b699102)
Tool call structure in Bedrock tests with compiler warnings
Model ID normalization with dashes to underscores

Changed

Tool call architecture: tool calls now stored in message.tool_calls field instead of content parts
Tool result architecture: tool results use message.tool_call_id for correlation
Context API: replaced push_user/push_assistant/push_system with append/prepend
Streaming protocol: pluggable architecture via parse_stream_protocol/2 callback
Provider implementations: improved delegation patterns reducing code duplication

Infrastructure

Massive test fixture update across all providers
Enhanced fixture system with amazon_bedrock provider mapping
Sanitized credential handling in fixtures (x-amz-security-token)
:xmerl added to extra_applications for STS XML parsing
Documentation and template improvements

1.0.0-rc.5 - 2025-02-07

Added

New Cerebras provider implementation with OpenAI-compatible Chat Completions API
Context.from_json/1 for JSON deserialization enabling round-trip serialization
Schema :in type support for enums, ranges, and MapSets with JSON Schema generation
Embed and embed_many functions supporting single and multiple text inputs
New reasoning controls: reasoning_effort, thinking_visibility, and reasoning_token_budget
Usage tracking for cached_tokens and reasoning_tokens across all providers
Model compatibility validation task (mix mc) with fixture-based testing
URL sanitization in transcripts to redact sensitive parameters (api_key, token)
Comprehensive example scripts for embeddings and multimodal analysis

Enhanced

Major coverage test refresh with extensive fixture updates across all providers
Unified generation options schema delegating to ReqLLM.Provider.Options
Provider response handling with better error messages and compatibility
Google Gemini streaming reliability and thinking budget support for 2.5 models
OpenAI provider with structured output response_format option and legacy tool call decoding
Groq provider with improved streaming and state management
Model synchronization and compatibility testing infrastructure
Documentation with expanded getting-started.livemd guide and fixes.md

Fixed

Legacy parameter normalization (stop_sequences, thinking, reasoning)
Google provider usage calculation handling missing candidatesTokenCount
OpenAI response handling for structured output and reasoning models
Groq encoding and streaming response handling
Timeout issues in model compatibility testing
String splitting for model names using parts: 2 for consistent pattern extraction

Changed

Deprecated parameters removed from provider implementations for cleaner code
Model compatibility task output format streamlined
Supported models state management with last recorded timestamps
Sample models configuration replacing test model references

Infrastructure

Added Plug dependency for testing
Dev tooling with tidewave for project_eval in dev scenarios
Enhanced .gitignore to track script files
Model prefix matching in compatibility task for improved filtering

1.0.0-rc.4 - 2025-01-29

Added

Claude 4.5 model support
Tool call support for Google Gemini provider
Cost calculation to Response.usage()
Unified mix req_llm.gen command consolidating all AI generation tasks

Enhanced

Major streaming refactor from Req to Finch for production stability
Documentation for provider architecture and streaming requests

Fixed

Streaming race condition causing BadMapError
max_tokens translation to max_completion_tokens for OpenAI reasoning models
Google Gemini role conversion ('assistant' to 'model')
req_http_options passing to Req
Context.Codec encoding of tool_calls field for OpenAI compatibility

Removed

Context.Codec and Response.Codec protocols (architectural simplification)

1.0.0-rc.3 - 2025-01-22

Added

New Mix tasks for local testing and exploration:
- generate_text, generate_object (structured output), and stream_object
- All tasks support --log-level and --debug-dir for easier debugging; stream_text gains debug logging
New providers: Alibaba (China) and Z.AI Coding Plan
Google provider:
- File content parts support (binary uploads via base64) for improved multimodal inputs
- Added Gemini Embedding 001 support
Model capability discovery and validation to catch unsupported features early (e.g., streaming, tools, structured output, embeddings)
Streaming utilities to capture raw SSE chunks and save streaming fixtures
Schema validation utilities for structured outputs with clearer, actionable errors

Enhanced

Major provider refactor to a unified, codec-based architecture
- More consistent request/response handling across providers and improved alignment with OpenAI semantics
Streaming reliability and performance improvements (better SSE parsing and handling)
Centralized model metadata handling for more accurate capabilities and configuration
Error handling and logging across the library for clearer diagnostics and easier troubleshooting
Embedding flow robustness and coverage

Fixed

More informative errors on invalid/partial provider responses and schema mismatches
Stability improvements in streaming and fixture handling across providers

Changed

jido_keys is now a required dependency (installed transitively; no code changes expected for most users)
Logging warnings standardized to Logger.warning

Internal

Testing infrastructure overhaul:
- New timing-aware LLMFixture system, richer streaming/object/tool-calling fixtures, and broader provider coverage
- Fake API key support for safer, more reliable test runs

Notes

No public API-breaking changes are expected; upgrades should be seamless for most users

1.0.0-rc.2 - 2025-01-15

Added

Model metadata guide with comprehensive documentation for managing AI model information
Local patching system for model synchronization, allowing custom model metadata overrides
.env.example file to guide API key setup and configuration
GitHub configuration files for automated dependency management and issue tracking
Test coverage reporting with ExCoveralls integration
Centralized ReqLLM.Keys module for unified API key management with clear precedence order

Fixed

BREAKING: Bang methods (generate_text!/3, stream_text!/3, generate_object!/4) now return naked values instead of {:ok, result} tuples (#9)
OpenAI o1 and o3 model parameter translation - automatic conversion of max_tokens to max_completion_tokens and removal of unsupported temperature parameter (#8, #11)
Mix task for streaming text updated to work with new bang method patterns
Embedding method documentation updated from generate_embeddings/2 to embed_many/2

Enhanced

Provider architecture with new translate_options/3 callback for model-specific parameter handling
API key management system with centralized ReqLLM.Keys module supporting multiple source precedence
Documentation across README.md, guides, and usage-rules.md for improved clarity and accuracy
GitHub workflow and dependency management with Dependabot automation
Response decoder modules streamlined by removing unused Model aliases
Mix.exs configuration with improved Dialyzer setup and dependency organization

Technical Improvements

Added validation for conflicting provider parameters with validate_mutex!/3
Enhanced error handling for unsupported parameter translations
Comprehensive test coverage for new translation functionality
Model synchronization with local patch merge capabilities
Improved documentation structure and formatting across all guides

Infrastructure

Weekly automated dependency updates via Dependabot
Standardized pull request and issue templates
Enhanced CI workflow with streamlined checks
Test coverage configuration and reporting setup

1.0.0-rc.1 - 2025-01-13

Added

First public release candidate
Composable plugin architecture built on Req
Support for 45+ providers and 665+ models via models.dev sync
Typed data structures for all API interactions
Dual API layers: low-level Req plugin and high-level helpers
Built-in streaming support with typed StreamChunk responses
Automatic usage and cost tracking
Anthropic and OpenAI provider implementations
Context Codec protocol for provider wire format conversion
JidoKeys integration for secure API key management
Comprehensive test matrix with fixture and live testing support
Tool calling capabilities
Embeddings generation support (OpenAI)
Structured data generation with schema validation
Extensive documentation and guides

Features

ReqLLM.generate_text/3 and generate_text!/3 for text generation
ReqLLM.stream_text/3 and stream_text!/3 for streaming responses
ReqLLM.generate_object/4 and generate_object!/4 for structured output
Embedding generation support
Low-level Req plugin integration
Provider-agnostic model specification with "provider:model" syntax
Automatic model metadata loading and cost calculation
Tool definition and execution framework
Message and content part builders
Usage statistics and cost tracking on all responses

Technical

Elixir ~> 1.15 compatibility
OTP 24+ support
Apache-2.0 license
Comprehensive documentation with HexDocs
Quality tooling with Dialyzer, Credo, and formatter
LiveFixture testing framework for API mocking

v1.3.0 (2026-01-21)

Features:

provider: add Zenmux provider and playground (#342) by youfun
provider: add Zenmux provider and playground by youfun
Implement reasoning signatures retainment (#344) by ycastorium
Implement reasoning retention for supported providers by ycastorium
feedback: add critical evaluation and recommendations for ReAct Agent implementation by mikehostetler
Adds support for Bearer tokens to Azure Foundry (#338) by ycastorium
Adds support for Bearer tokens to Azure Foundry by ycastorium
provider: add vLLM provider for self-hosted OpenAI-compatible models (#202) by meanderingstream
Add Azure DeepSeek model support (#254) by shelvick
Add Azure DeepSeek and MAI-DS model family routing by shelvick
Add Azure DeepSeek models to supported_models.json by shelvick
Add Azure AI Foundry endpoint format support by shelvick
Add Azure OpenAI embedding model support by shelvick
support file URI for Google image_url content parts (#339) by brent-emb
add service_tier to openai provider options (#321) by Barna Kovacs
add configuration and prompt templates for Elixir Mix project phases by mikehostetler
change to typedstruct #256) by mikehostetler
extend Context.normalize to handle tool_calls and tool result messages (#313) by mikehostetler
add StreamResponse.classify/1 and Response.Stream.summarize/1 (#311) by mikehostetler
Add thinking parameter support for Z.ai providers (#303) by George Guimarães
openrouter: add support for google/gemini-3-flash-preview (#298) by Itay Adler

Bug Fixes:

Correct encrypted? flag in Anthropic reasoning details extraction by mikehostetler
address PR review feedback by youfun
Fixed dialyzer issue by ycastorium
add zai_coding_plan provider support (#347) by mikehostetler
add zai_coding_plan provider support by mikehostetler
correct cache token handling for Anthropic API semantics (#316) by shelvick
Infer reasoning tokens from reasoning_content field by shelvick
Adds missing reasoning levels (#332) by ycastorium
Adds missing reasoning levels by ycastorium
Google Gemini thinking tokens now included in cost calculation (#336) by shelvick
Google Gemini thinking tokens now included in cost calculation by shelvick
thinkingTokenCount â thoughtsTokenCount in gemini.ex by shelvick
ensure add_reasoning_to_cost flag is preserved from providers by shelvick
allow hyphenated tool names for MCP server compatibility (#323) by Jon Ator
allow hyphenated tool names for MCP server compatibility by Jon Ator
disallow leading/trailing hyphens in tool names by Jon Ator
revert typedstruct to typed_struct to resolve ecosystem conflicts (#315) by mikehostetler
Azure provider_options validation and ResponsesAPI finish_reason parsing (#266) by shelvick
azure: Accept OpenAI-specific provider_options for Azure OpenAI models by shelvick
openai: Extract finish_reason from correct path in response.incomplete events by shelvick
Correct cache token extraction and cost calculation (#309) by shelvick
Allow passing json arrays when using JsonSchema and fix Gemini 3 Json schema calls (#310) by Akash Khan
Allow using a json array as response schema by Akash Khan
Allow setting json schema for gemini 3 models by Akash Khan
Add thinking option to Z.ai provider schema (#304) by George Guimarães
Always set responseMimeType for gemini generate_object requests (#299) by Akash Khan

v1.2.0 (2025-12-23)

Features:

add image generation support (#293) by Victor

v1.1.0 (2025-12-21)

Features:

preserve cache_control metadata in OpenAI content encoding (#291) by Itay Adler
add load_dotenv config option to control .env file loading (#287) by mikehostetler
Support inline JSON credentials for Google Vertex AI (#260) by shelvick
anthropic: Add message caching support for conversation prefixes (#281) by shelvick
anthropic: Add message caching support for conversation prefixes by shelvick
anthropic: Add offset support to message caching by shelvick
vertex: Add Google Search grounding support for Gemini models (#284) by shelvick
vertex: Add Google Search grounding support for Gemini models by shelvick
add AI PR review workflow by mikehostetler
change to typedstruct (#256) by JoeriDijkstra
Add Google Context Caching support for Gemini models (#193) by neilberkman
Add Google Vertex Gemini support by Neil Berkman
Add credential fallback for fixture recording (#218) by neilberkman
Integrate llm_db for model metadata (v1.1.0) (#212) by mikehostetler
req_llm: accept LLMDB.Model; remove runtime fields from Model struct by mikehostetler
allow task_type with google embeddings by Kasun Vithanage
add StreamResponse.process_stream/2 for real-time callbacks (#178) by Edgar Gomes

Bug Fixes:

Propagate streaming errors to process_stream result (#286) by mikehostetler
Add anthropic_cache_messages to Bedrock and Vertex schemas by shelvick
bedrock: Remove incorrect Converse API requirement for inference profiles by shelvick
vertex: Extract google_grounding from nested provider_options by shelvick
vertex: Remove incorrect camelCase transformation for grounding tools by shelvick
increase default timeout for OpenAI reasoning models (#252) by mikehostetler
merge consecutive tool results into single user message (#243) (#250) by mikehostetler
respect existing env vars when loading .env (#239) (#249) by mikehostetler
typespec on object generation to allow zoi schemas (#208) by Kasun Vithanage
typespec for zoi schemas on object generation by Kasun Vithanage

Refactoring:

req_llm: move max_retries to request options by mikehostetler
req_llm: delegate model metadata to LLMDB; keep provider registry by mikehostetler

← Previous Page Z.AI Coder

Next Page → Contributing to ReqLLM