Changelog

View Source

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

1.4.0 - 2026-01-30

Added

  • Reasoning cost breakdown with reasoning_cost field in cost calculations (#394)
  • OpenRouter enhancements (#393, #374)
    • openrouter_usage and openrouter_plugins provider options
    • Native JSON schema structured output support
  • Google URL context provider option for grounding (#392)
  • Google auth header option for streaming requests (#382)
  • Cohere Embeddings on Bedrock support (#365)
  • Structured and multimodal tool outputs (#357)
  • OpenAI strict mode for JSON schema validation (#368)
  • OpenAI verbosity support for reasoning models (#354)
  • Comprehensive usage and billing infrastructure (#371)
  • Ollama usage guide documentation (#387)
  • Model base_url override in model configuration (#366)
  • Anthropic web search tool support for real-time web content access
    • web_search provider option enables Claude to search the web during conversations
    • Configurable options: max_uses, allowed_domains, blocked_domains, user_location
    • Automatic source citations in responses
    • Works with all supported Claude models (Sonnet 4.5, Sonnet 4, Haiku 4.5, Haiku 3.5, Opus 4.5, Opus 4.1, Opus 4)
    • Can be combined with regular function tools
    • Pricing: $10 per 1,000 searches plus standard token costs
    • Example: provider_options: [web_search: %{max_uses: 5, allowed_domains: ["wikipedia.org"]}]
  • ReqLLM.ModelHelpers capability helper functions for type-safe model capability access
    • Centralized helpers like ReqLLM.ModelHelpers.json_schema?/1, ReqLLM.ModelHelpers.tools_strict?/1
    • Replaces scattered get_in(model.capabilities, ...) calls across providers
    • Fixes bug in Bedrock provider where reasoning capability was checked incorrectly (was checking capabilities.reasoning map instead of capabilities.reasoning.enabled boolean)
    • Provides single source of truth for capability access patterns
    • Compile-time generated functions from capability schema paths
  • Amazon Bedrock service tier support for request prioritization
    • service_tier option with values: "priority", "default", "flex"
    • Priority tier provides faster responses at premium cost for mission-critical workloads
    • Flex tier offers cost-effective processing for non-urgent tasks
    • Supported on compatible Bedrock models (check AWS documentation for availability)
  • Google Vertex AI Gemini model support
    • Gemini 2.0 Flash, 2.5 Flash, 2.5 Flash Lite, and 2.5 Pro on Google Vertex AI
    • Delegates to native Google provider format with Vertex-specific quirks handled
    • Sanitizes function call IDs (Vertex API rejects them while direct Google API includes them)
    • Full support for extended thinking/reasoning, context caching, and all Gemini features
    • Complete fixture coverage for all Vertex Gemini models (46 fixtures: 10 for 2.0, 12 each for 2.5 variants)
  • Google Context Caching for Gemini models with up to 90% cost savings
    • ReqLLM.Providers.Google.CachedContent module for cache CRUD operations
    • Create, list, update, and delete cached content
    • Support for both Google AI Studio and Vertex AI (requires Gemini models)
    • cached_content provider option to reference existing caches
    • Minimum token requirements: 1,024 (Flash) / 4,096 (Pro)
  • OAuth2 token caching for Google Vertex AI
    • Eliminates 60-180ms auth overhead on every request
    • Tokens cached for 55 minutes (5 minute safety margin before 1 hour expiry)
    • GenServer serializes concurrent refresh requests to prevent duplicate fetches
    • Per-node cache (no distributed coordination needed)
    • 99.9% reduction in auth overhead for typical workloads
  • Real-time stream processing with ReqLLM.StreamResponse.process_stream/2
    • Process streams incrementally with real-time callbacks
    • on_result callback for content chunks (fires immediately as text arrives)
    • on_thinking callback for reasoning/thinking chunks (fires immediately)
    • Prevents double-stream consumption bugs through single-pass processing
    • Enables real-time streaming to UIs (Phoenix LiveView, websockets, etc.)
    • No upfront Enum.to_list - callbacks fire as chunks arrive from the stream
  • Provider alias support via llm_db integration
    • Google Vertex AI Anthropic models now accessible via :google_vertex provider
    • google_vertex_anthropic provider aliased to google_vertex implementation
    • Enables single provider module to serve models from multiple llm_db providers
    • Complete fixture coverage for all Vertex Claude models (36 fixtures: 12 per model × 3 models)
  • provider_model_id support for AWS Bedrock inference profiles
    • Models can specify API-specific identifiers separate from canonical IDs
    • Enables Bedrock streaming and on-demand throughput with inference profile prefixes
    • Applied to Claude Haiku 4.5, Sonnet 4.5, Opus 4.1, Llama 3.3 70B models
  • Credential fallback for fixture recording in providers requiring cloud credentials
    • Automatic fallback to existing fixtures when credentials are missing during RECORD mode
    • Provider-specific credential detection via optional credential_missing?/1 callback
    • Implemented in AWS Bedrock, Google, and Google Vertex AI providers
    • Enables comprehensive test coverage without requiring all developers to configure cloud credentials

Changed

  • Zoi schema refactor for data structures, replacing TypedStruct (#376)
  • AWS Event Stream parser documentation clarifying Bedrock specialization
    • Explains performance rationale for single-pass parsing and header specialization
    • Documents non-goals (S3 Select, Transcribe, Kinesis incompatibility)
  • Object generation detection updated to recognize tool-based workaround
    • supports_object_generation? now accepts models with tools.enabled = true
    • Enables object generation tests for Vertex Claude models using tool workaround

Fixed

  • Image-only attachments validation for OpenAI and xAI (#389)
  • translate_options changes now preserved in provider_options (#381)
  • StreamServer termination handled gracefully in FinchClient (#379)
  • Anthropic schema constraints stripped when unsupported (#378)
  • api_key added to internal keys preventing leakage (#355)
  • Test helper tool_budget_for/1 pattern match regression from LLMDB integration
    • Fixed pattern match to use {:ok, model} instead of obsolete {:ok, {provider, id, model}}
    • Fixed field name from model.limit to model.limits
    • Regression introduced in v1.1.0 caused test fixtures to use incorrect maxOutputTokens values
    • Primarily affected reasoning-enabled models (Gemini 2.5 Pro) where 150 token default was insufficient
  • Google provider cached token extraction from API responses
    • Extracts cachedContentTokenCount from usageMetadata for both implicit and explicit caching
    • Converts to OpenAI-compatible prompt_tokens_details.cached_tokens format
    • Fixes cached tokens always showing as 0 even when caching was active
    • Affects both google and google-vertex providers using Gemini models
  • JSV schema validation now preserves original data types instead of returning cast values
    • Prevents unwanted type coercion (e.g., 1.0 → 1 for integer schemas)
    • Validation still enforces schema constraints, but returns original input data
  • JSV schema compilation performance improved with ETS-based caching
    • Compiled schemas cached globally to avoid redundant JSV.build!/1 calls
    • Configured with read_concurrency for fast concurrent access
  • Google Vertex AI provider guide missing from documentation
    • Added google_vertex.md to mix.exs extras and Providers group

1.0.0 - 2025-11-02

Added

  • Google Vertex AI provider with comprehensive Claude 4.x support
    • OAuth2 authentication with service accounts
    • Full Claude model support (Haiku 4.5, Sonnet 4.5, Opus 4.1)
    • Extended thinking and prompt caching capabilities
    • Complete fixtures for all Vertex AI Claude models
  • AWS Bedrock inference profile models with complete fixture coverage
    • Anthropic Claude inference profiles (Haiku 4.5, Sonnet 4.5, Opus 4.1)
    • OpenAI OSS models (gpt-oss-20b, gpt-oss-120b)
    • Meta Llama inference profiles
    • Cohere Command R and Command R Plus models
  • Provider base URL override capability via application config
    • Enables testing with mock services
    • Configured per-provider in application config
  • AWS Bedrock API key authentication support (introduced by AWS in July 2025)
    • Simple Bearer token authentication as alternative to IAM credentials
    • api_key provider option with AWS_BEARER_TOKEN_BEDROCK environment variable fallback
    • Short-term keys (up to 12 hours) recommended for production
    • Long-term keys available for exploration
    • Limitations: Cannot use with InvokeModelWithBidirectionalStream, Agents, or Data Automation
  • Context tools persistence for AWS Bedrock multi-turn conversations
    • Tools automatically persist in context after first request
    • Bedrock-specific implementation with zero impact on other providers
  • Schema map-subtyped list support for complex nested structures
    • Properly handles {:list, {:map, schema}} type definitions
    • Generates correct JSON Schema for nested object arrays

Enhanced

  • Google provider v1beta API as default version
    • Fixes streaming compatibility issues
    • Updated all test fixtures to use v1beta
  • Test configuration expanded with additional LLM providers
    • Enhanced catalog_allow settings for broader provider coverage
  • Documentation organization with refactored guides structure
    • Improved provider-specific documentation
    • Better task organization in mix.exs

Fixed

  • Streaming protocol callback renamed from decode_sse_event to decode_stream_event
    • More protocol-agnostic naming (supports SSE, AWS Event Stream, etc.)
    • Affects all providers implementing streaming
  • Groq UTF-8 boundary handling in streaming responses
    • Prevents crashes when UTF-8 characters split across chunk boundaries
  • Schema boolean encoding preventing invalid string coercion
    • Boolean values now correctly encoded in normalized schemas
  • OpenAI model list typo corrected in documentation
  • AWS Bedrock Anthropic inference profile model ID preservation
    • Added preserve_inference_profile?/1 callback to Anthropic Bedrock formatter
    • Ensures region prefixes (global., us.) are preserved in API requests
    • Fixes 400 "invalid model identifier" errors for inference profile models
  • AWS Bedrock Converse API usage field parsing
    • Fixed parse_usage/1 to include all required fields (reasoning_tokens, total_tokens, cached_tokens)
    • Fixes KeyError when accessing usage fields from Converse API responses
  • AWS Bedrock model ID normalization for metadata lookup
    • Fixed normalize_model_id/1 to always strip region prefixes for Registry lookups
    • Enables capabilities detection for inference profile models
    • Separates metadata lookup (always normalized) from API requests (preserve_inference_profile? controls)
  • AWS Bedrock provider model family support
    • Added Meta to @model_families for Llama models using Converse API
    • Added OpenAI to @model_families for gpt-oss models
    • Cohere Command R models use Converse API directly with full tool support (no custom formatter needed)

Notes

This is the first stable 1.0 release of ReqLLM, marking production readiness with comprehensive provider support, robust streaming, and extensive test coverage. The library now supports 15+ providers with 750+ models and includes advanced features like prompt caching, structured output, tool calling, and embeddings.

1.0.0-rc.8 - 2025-10-29

Added

  • Prompt caching support for Bedrock Anthropic models (Claude on AWS Bedrock)
    • Auto-switches to native API when caching enabled with tools for full cache control
    • Supports caching of system prompts and tools
    • Provides warning when auto-switching (silenceable with explicit use_converse setting)
  • Structured output (:object operation) support for AWS Bedrock provider
    • Bedrock Anthropic sub-provider using tool-calling approach
    • Bedrock Converse API for unified structured output across all models
    • Bedrock OpenAI sub-provider (gpt-oss models)
  • Google Search grounding support for Google Gemini models via built-in tools
    • New google_grounding option to enable web search during generation
    • API versioning support (v1 and v1beta) for Google provider
    • Grounding metadata included in responses when available
  • JSON Schema validation using JSV library (supports draft 2020-12 and draft 7)
    • Client-side schema validation before sending to providers
    • Better error messages for invalid schemas (e.g., embedded JSON strings vs maps)
    • Validates raw JSON schemas via ReqLLM.Schema.validate/2
  • Model catalog feature for runtime model discovery
  • Configurable metadata_timeout option for long-running streams (default: 60s)
    • Application-level configuration support
    • Fixes metadata collection timeout errors on large documents
  • HTTP streaming in StreamServer with improved lifecycle management
  • Direct JSON schema pass-through support for complex object generation
  • Base URL override capability for testing with mock services
  • API key option in provider defaults with proper precedence handling
  • task_type parameter support for Google embeddings

Enhanced

  • Bedrock provider with comprehensive fixes and improvements
    • Streaming temperature/top_p conflict resolution via Options.process pipeline
    • Extended thinking (reasoning) support with proper reasoning_effort translation
    • Tool round-trip conversations by extracting stub tools from messages
    • Complete usage metadata fields (cached_tokens, reasoning_tokens) for all models
    • Increased receive timeout from 30s to 60s for large responses
    • Unified streaming and non-streaming to use Options.process pipeline
    • Uses model capabilities instead of hardcoded model IDs for reasoning support detection
  • Meta/Llama support refactored into reusable generic provider
    • Created ReqLLM.Providers.Meta for Meta's native prompt format
    • Bedrock Meta now delegates to generic provider for format conversion
    • Enables future Azure AI Foundry and Vertex AI support
    • Documents that most providers (Azure, Vertex AI, vLLM, Ollama) use OpenAI-compatible APIs
  • OpenAI provider with JSON Schema response format support for GPT-5 models
  • Streaming error handling with HTTP status code validation
    • Proper error propagation for 4xx/5xx responses
    • Prevents error JSON from being passed to SSE parser
  • Model metadata tests with improved field mapping validation
  • Documentation across provider guides and API references

Fixed

  • Bedrock streaming binary protocol (AWS Event Stream) encoding in fixtures
    • Removed redundant "decoded" field that caused Jason.EncodeError
    • Fixtures now only store "b64" field for binary protocols (contains invalid UTF-8)
  • Bedrock thinking parameter removal for forced tool_choice scenarios
    • Extended thinking incompatible with object generation fixed via post-processing
    • Thinking parameter correctly removed when incompatible with forced tool_choice
  • Bedrock tool round-trip conversations now work correctly
    • Extracts stub tools from messages when tools required but not provided
    • Bedrock requires tools definition even for multi-turn tool conversations
    • Supports both ReqLLM.Tool structs and minimal stub tools for validation
  • Bedrock usage metrics now include all required fields
    • Meta Llama models provide complete usage data (cached_tokens, reasoning_tokens)
    • OpenAI OSS models provide complete usage data
  • Model compatibility task now uses normalize_model_id callback for registry lookups
    • Fixes inference profile ID recognition (e.g., global.anthropic.claude-sonnet-4-5)
  • Missing :compiled_schema in object streaming options (KeyError fix across all providers)
  • Nil tool names in streaming deltas now properly guarded
  • Tool.Inspect protocol crash when inspecting tools with JSON Schema (map) parameter schemas
  • HTTP/2 flow control bug with large request bodies (>64KB)
    • Changed default Finch pool from [:http2, :http1] to [:http1]
    • Added validation to prevent HTTP/2 with large payloads
  • ArgumentError when retry function returns {:delay, ms} (Req 0.5.15+ compatibility)
  • Validation errors now use correct Error struct fields (reason vs errors)
  • Dialyzer type mismatches in decode_response/2

Changed

  • Removed JidoKeys dependency, simplified to dotenvy for .env file loading
    • API keys now loaded from .env files at startup
    • Precedence: runtime options > application config > system environment
  • Upgraded dependencies:
    • ex_aws_auth from ~> 1.0 to ~> 1.3
    • ex_doc from 0.38.4 to 0.39.1
    • zoi from 0.7.4 to 0.8.1
    • credo to 1.7.13
  • Refactored Bedrock provider to use modern ex_aws_auth features
    • Migrated to AWSAuth.Credentials struct for credential management
    • Replaced manual request signing with AWSAuth.Req plugin (removed ~40 lines of code)
    • Updated Finch streaming to use credential-based signing API
    • Session tokens now handled automatically by ex_aws_auth
    • Simplified STS AssumeRole implementation using credential-based API
  • Comprehensive test timeout increased from 180s to 300s for slow models (e.g., Claude Opus 4.1)
  • Formatter line length standardized to 98 characters
  • Quokka dependency pinned to specific version (2.11.2)

Removed

  • Outdated test fixtures for deprecated models (Claude 3.5 Sonnet variants, OpenAI o1/o3/o4 variants)
  • Over 85,000 lines of stale fixture data cleaned up

Infrastructure

  • CI workflow updates for Elixir 1.18/1.19 on OTP 27/28
  • Enhanced GitHub Actions configuration with explicit version matrix
  • Added hex.pm best practices (changelog link, module grouping)
  • Improved documentation organization with provider-specific guides
  • Added Claude Opus 4.1 (us.anthropic.claude-opus-4-1-20250805-v1:0) to ModelMatrix

1.0.0-rc.7 - 2025-10-16

Changed

  • Updated Elixir compatibility to support 1.19
  • Replaced aws_auth GitHub dependency with ex_aws_auth from Hex for Hex publishing compatibility
  • Enhanced Dialyzer configuration with ignore_warnings option
  • Refactored request struct creation across providers using Req.new/2

Added

  • Provider normalize_model_id/1 callback for model identifier normalization
  • Amazon Bedrock support for inference profiles with region prefix stripping
  • ToolCall helper functions: function_name/1, json_arguments/1, arguments/1, find_args/2
  • New model definitions for Alibaba, Fireworks AI, GitHub Models, Moonshot AI, and Zhipu AI
  • Claude Haiku 4.5 model entries across multiple providers

Refactored

  • Removed normalization layer for tool calls, using ReqLLM.ToolCall structs directly
  • Simplified tool call extraction using find_args/2 across provider modules

1.0.0-rc.6 - 2025-02-15

Added

  • AWS Bedrock provider with streaming support and multi-model capabilities
    • Anthropic Claude models with native API delegation
    • OpenAI OSS models (gpt-oss-120b, gpt-oss-20b)
    • Meta Llama models with native prompt formatting
    • AWS Event Stream binary protocol parser
    • AWS Signature V4 authentication (OTP 27 compatible)
    • Converse API for unified tool calling across all Bedrock models
    • AWS STS AssumeRole support for temporary credentials
    • Extended thinking support via additionalModelRequestFields
    • Cross-region inference profiles (global prefix)
  • Z.AI provider with standard and coding endpoints
    • GLM-4.5, GLM-4.5-air, GLM-4.5-flash models (131K context)
    • GLM-4.6 (204K context, improved reasoning)
    • GLM-4.5v (vision model with image/video support)
    • Tool calling and reasoning capabilities
    • Separate endpoints for general chat and coding tasks
  • ToolCall struct for standardized tool call representation
  • Context.append/2 and Context.prepend/2 methods replacing push_* methods
  • Comprehensive example scripts (embeddings, context reuse, reasoning tokens, multimodal)
  • StreamServer support for raw fixture generation and reasoning token tracking

Enhanced

  • Google provider with native responseSchema for structured output
  • Google file/video attachment support with OpenAI-formatted data URIs
  • XAI provider with improved structured output test coverage
  • OpenRouter and Google model fixture coverage
  • Model compatibility task with migrate and failed_only options
  • Context handling to align with OpenAI's tool_calls API format
  • Tool result encoding for multi-turn conversations across all providers
  • max_tokens extraction from Model.new/3 to respect model defaults
  • Error handling for metadata-only providers with structured Splode errors
  • Provider implementations to delegate to shared helper functions

Fixed

  • get_provider/1 returning {:ok, nil} for metadata-only providers
  • Anthropic tool result encoding for multi-turn conversations (transform :tool role to :user)
  • Google structured output using native responseSchema without additionalProperties
  • Z.AI provider timeout and reasoning token handling
  • max_tokens not being respected from Model.new/3 across providers
  • File/video attachment support in Google provider (regression from b699102)
  • Tool call structure in Bedrock tests with compiler warnings
  • Model ID normalization with dashes to underscores

Changed

  • Tool call architecture: tool calls now stored in message.tool_calls field instead of content parts
  • Tool result architecture: tool results use message.tool_call_id for correlation
  • Context API: replaced push_user/push_assistant/push_system with append/prepend
  • Streaming protocol: pluggable architecture via parse_stream_protocol/2 callback
  • Provider implementations: improved delegation patterns reducing code duplication

Infrastructure

  • Massive test fixture update across all providers
  • Enhanced fixture system with amazon_bedrock provider mapping
  • Sanitized credential handling in fixtures (x-amz-security-token)
  • :xmerl added to extra_applications for STS XML parsing
  • Documentation and template improvements

1.0.0-rc.5 - 2025-02-07

Added

  • New Cerebras provider implementation with OpenAI-compatible Chat Completions API
  • Context.from_json/1 for JSON deserialization enabling round-trip serialization
  • Schema :in type support for enums, ranges, and MapSets with JSON Schema generation
  • Embed and embed_many functions supporting single and multiple text inputs
  • New reasoning controls: reasoning_effort, thinking_visibility, and reasoning_token_budget
  • Usage tracking for cached_tokens and reasoning_tokens across all providers
  • Model compatibility validation task (mix mc) with fixture-based testing
  • URL sanitization in transcripts to redact sensitive parameters (api_key, token)
  • Comprehensive example scripts for embeddings and multimodal analysis

Enhanced

  • Major coverage test refresh with extensive fixture updates across all providers
  • Unified generation options schema delegating to ReqLLM.Provider.Options
  • Provider response handling with better error messages and compatibility
  • Google Gemini streaming reliability and thinking budget support for 2.5 models
  • OpenAI provider with structured output response_format option and legacy tool call decoding
  • Groq provider with improved streaming and state management
  • Model synchronization and compatibility testing infrastructure
  • Documentation with expanded getting-started.livemd guide and fixes.md

Fixed

  • Legacy parameter normalization (stop_sequences, thinking, reasoning)
  • Google provider usage calculation handling missing candidatesTokenCount
  • OpenAI response handling for structured output and reasoning models
  • Groq encoding and streaming response handling
  • Timeout issues in model compatibility testing
  • String splitting for model names using parts: 2 for consistent pattern extraction

Changed

  • Deprecated parameters removed from provider implementations for cleaner code
  • Model compatibility task output format streamlined
  • Supported models state management with last recorded timestamps
  • Sample models configuration replacing test model references

Infrastructure

  • Added Plug dependency for testing
  • Dev tooling with tidewave for project_eval in dev scenarios
  • Enhanced .gitignore to track script files
  • Model prefix matching in compatibility task for improved filtering

1.0.0-rc.4 - 2025-01-29

Added

  • Claude 4.5 model support
  • Tool call support for Google Gemini provider
  • Cost calculation to Response.usage()
  • Unified mix req_llm.gen command consolidating all AI generation tasks

Enhanced

  • Major streaming refactor from Req to Finch for production stability
  • Documentation for provider architecture and streaming requests

Fixed

  • Streaming race condition causing BadMapError
  • max_tokens translation to max_completion_tokens for OpenAI reasoning models
  • Google Gemini role conversion ('assistant' to 'model')
  • req_http_options passing to Req
  • Context.Codec encoding of tool_calls field for OpenAI compatibility

Removed

  • Context.Codec and Response.Codec protocols (architectural simplification)

1.0.0-rc.3 - 2025-01-22

Added

  • New Mix tasks for local testing and exploration:
    • generate_text, generate_object (structured output), and stream_object
    • All tasks support --log-level and --debug-dir for easier debugging; stream_text gains debug logging
  • New providers: Alibaba (China) and Z.AI Coding Plan
  • Google provider:
    • File content parts support (binary uploads via base64) for improved multimodal inputs
    • Added Gemini Embedding 001 support
  • Model capability discovery and validation to catch unsupported features early (e.g., streaming, tools, structured output, embeddings)
  • Streaming utilities to capture raw SSE chunks and save streaming fixtures
  • Schema validation utilities for structured outputs with clearer, actionable errors

Enhanced

  • Major provider refactor to a unified, codec-based architecture
    • More consistent request/response handling across providers and improved alignment with OpenAI semantics
  • Streaming reliability and performance improvements (better SSE parsing and handling)
  • Centralized model metadata handling for more accurate capabilities and configuration
  • Error handling and logging across the library for clearer diagnostics and easier troubleshooting
  • Embedding flow robustness and coverage

Fixed

  • More informative errors on invalid/partial provider responses and schema mismatches
  • Stability improvements in streaming and fixture handling across providers

Changed

  • jido_keys is now a required dependency (installed transitively; no code changes expected for most users)
  • Logging warnings standardized to Logger.warning

Internal

  • Testing infrastructure overhaul:
    • New timing-aware LLMFixture system, richer streaming/object/tool-calling fixtures, and broader provider coverage
    • Fake API key support for safer, more reliable test runs

Notes

  • No public API-breaking changes are expected; upgrades should be seamless for most users

1.0.0-rc.2 - 2025-01-15

Added

  • Model metadata guide with comprehensive documentation for managing AI model information
  • Local patching system for model synchronization, allowing custom model metadata overrides
  • .env.example file to guide API key setup and configuration
  • GitHub configuration files for automated dependency management and issue tracking
  • Test coverage reporting with ExCoveralls integration
  • Centralized ReqLLM.Keys module for unified API key management with clear precedence order

Fixed

  • BREAKING: Bang methods (generate_text!/3, stream_text!/3, generate_object!/4) now return naked values instead of {:ok, result} tuples (#9)
  • OpenAI o1 and o3 model parameter translation - automatic conversion of max_tokens to max_completion_tokens and removal of unsupported temperature parameter (#8, #11)
  • Mix task for streaming text updated to work with new bang method patterns
  • Embedding method documentation updated from generate_embeddings/2 to embed_many/2

Enhanced

  • Provider architecture with new translate_options/3 callback for model-specific parameter handling
  • API key management system with centralized ReqLLM.Keys module supporting multiple source precedence
  • Documentation across README.md, guides, and usage-rules.md for improved clarity and accuracy
  • GitHub workflow and dependency management with Dependabot automation
  • Response decoder modules streamlined by removing unused Model aliases
  • Mix.exs configuration with improved Dialyzer setup and dependency organization

Technical Improvements

  • Added validation for conflicting provider parameters with validate_mutex!/3
  • Enhanced error handling for unsupported parameter translations
  • Comprehensive test coverage for new translation functionality
  • Model synchronization with local patch merge capabilities
  • Improved documentation structure and formatting across all guides

Infrastructure

  • Weekly automated dependency updates via Dependabot
  • Standardized pull request and issue templates
  • Enhanced CI workflow with streamlined checks
  • Test coverage configuration and reporting setup

1.0.0-rc.1 - 2025-01-13

Added

  • First public release candidate
  • Composable plugin architecture built on Req
  • Support for 45+ providers and 665+ models via models.dev sync
  • Typed data structures for all API interactions
  • Dual API layers: low-level Req plugin and high-level helpers
  • Built-in streaming support with typed StreamChunk responses
  • Automatic usage and cost tracking
  • Anthropic and OpenAI provider implementations
  • Context Codec protocol for provider wire format conversion
  • JidoKeys integration for secure API key management
  • Comprehensive test matrix with fixture and live testing support
  • Tool calling capabilities
  • Embeddings generation support (OpenAI)
  • Structured data generation with schema validation
  • Extensive documentation and guides

Features

  • ReqLLM.generate_text/3 and generate_text!/3 for text generation
  • ReqLLM.stream_text/3 and stream_text!/3 for streaming responses
  • ReqLLM.generate_object/4 and generate_object!/4 for structured output
  • Embedding generation support
  • Low-level Req plugin integration
  • Provider-agnostic model specification with "provider:model" syntax
  • Automatic model metadata loading and cost calculation
  • Tool definition and execution framework
  • Message and content part builders
  • Usage statistics and cost tracking on all responses

Technical

  • Elixir ~> 1.15 compatibility
  • OTP 24+ support
  • Apache-2.0 license
  • Comprehensive documentation with HexDocs
  • Quality tooling with Dialyzer, Credo, and formatter
  • LiveFixture testing framework for API mocking

v1.3.0 (2026-01-21)

Features:

  • provider: add Zenmux provider and playground (#342) by youfun

  • provider: add Zenmux provider and playground by youfun

  • Implement reasoning signatures retainment (#344) by ycastorium

  • Implement reasoning retention for supported providers by ycastorium

  • feedback: add critical evaluation and recommendations for ReAct Agent implementation by mikehostetler

  • Adds support for Bearer tokens to Azure Foundry (#338) by ycastorium

  • Adds support for Bearer tokens to Azure Foundry by ycastorium

  • provider: add vLLM provider for self-hosted OpenAI-compatible models (#202) by meanderingstream

  • Add Azure DeepSeek model support (#254) by shelvick

  • Add Azure DeepSeek and MAI-DS model family routing by shelvick

  • Add Azure DeepSeek models to supported_models.json by shelvick

  • Add Azure AI Foundry endpoint format support by shelvick

  • Add Azure OpenAI embedding model support by shelvick

  • support file URI for Google image_url content parts (#339) by brent-emb

  • add service_tier to openai provider options (#321) by Barna Kovacs

  • add configuration and prompt templates for Elixir Mix project phases by mikehostetler

  • change to typedstruct #256) by mikehostetler

  • extend Context.normalize to handle tool_calls and tool result messages (#313) by mikehostetler

  • add StreamResponse.classify/1 and Response.Stream.summarize/1 (#311) by mikehostetler

  • Add thinking parameter support for Z.ai providers (#303) by George Guimarães

  • openrouter: add support for google/gemini-3-flash-preview (#298) by Itay Adler

Bug Fixes:

  • Correct encrypted? flag in Anthropic reasoning details extraction by mikehostetler

  • address PR review feedback by youfun

  • Fixed dialyzer issue by ycastorium

  • add zai_coding_plan provider support (#347) by mikehostetler

  • add zai_coding_plan provider support by mikehostetler

  • correct cache token handling for Anthropic API semantics (#316) by shelvick

  • Infer reasoning tokens from reasoning_content field by shelvick

  • Adds missing reasoning levels (#332) by ycastorium

  • Adds missing reasoning levels by ycastorium

  • Google Gemini thinking tokens now included in cost calculation (#336) by shelvick

  • Google Gemini thinking tokens now included in cost calculation by shelvick

  • thinkingTokenCount → thoughtsTokenCount in gemini.ex by shelvick

  • ensure add_reasoning_to_cost flag is preserved from providers by shelvick

  • allow hyphenated tool names for MCP server compatibility (#323) by Jon Ator

  • allow hyphenated tool names for MCP server compatibility by Jon Ator

  • disallow leading/trailing hyphens in tool names by Jon Ator

  • revert typedstruct to typed_struct to resolve ecosystem conflicts (#315) by mikehostetler

  • Azure provider_options validation and ResponsesAPI finish_reason parsing (#266) by shelvick

  • azure: Accept OpenAI-specific provider_options for Azure OpenAI models by shelvick

  • openai: Extract finish_reason from correct path in response.incomplete events by shelvick

  • Correct cache token extraction and cost calculation (#309) by shelvick

  • Allow passing json arrays when using JsonSchema and fix Gemini 3 Json schema calls (#310) by Akash Khan

  • Allow using a json array as response schema by Akash Khan

  • Allow setting json schema for gemini 3 models by Akash Khan

  • Add thinking option to Z.ai provider schema (#304) by George Guimarães

  • Always set responseMimeType for gemini generate_object requests (#299) by Akash Khan

v1.2.0 (2025-12-23)

Features:

  • add image generation support (#293) by Victor

v1.1.0 (2025-12-21)

Features:

  • preserve cache_control metadata in OpenAI content encoding (#291) by Itay Adler

  • add load_dotenv config option to control .env file loading (#287) by mikehostetler

  • Support inline JSON credentials for Google Vertex AI (#260) by shelvick

  • anthropic: Add message caching support for conversation prefixes (#281) by shelvick

  • anthropic: Add message caching support for conversation prefixes by shelvick

  • anthropic: Add offset support to message caching by shelvick

  • vertex: Add Google Search grounding support for Gemini models (#284) by shelvick

  • vertex: Add Google Search grounding support for Gemini models by shelvick

  • add AI PR review workflow by mikehostetler

  • change to typedstruct (#256) by JoeriDijkstra

  • Add Google Context Caching support for Gemini models (#193) by neilberkman

  • Add Google Vertex Gemini support by Neil Berkman

  • Add credential fallback for fixture recording (#218) by neilberkman

  • Integrate llm_db for model metadata (v1.1.0) (#212) by mikehostetler

  • req_llm: accept LLMDB.Model; remove runtime fields from Model struct by mikehostetler

  • allow task_type with google embeddings by Kasun Vithanage

  • add StreamResponse.process_stream/2 for real-time callbacks (#178) by Edgar Gomes

Bug Fixes:

  • Propagate streaming errors to process_stream result (#286) by mikehostetler

  • Add anthropic_cache_messages to Bedrock and Vertex schemas by shelvick

  • bedrock: Remove incorrect Converse API requirement for inference profiles by shelvick

  • vertex: Extract google_grounding from nested provider_options by shelvick

  • vertex: Remove incorrect camelCase transformation for grounding tools by shelvick

  • increase default timeout for OpenAI reasoning models (#252) by mikehostetler

  • merge consecutive tool results into single user message (#243) (#250) by mikehostetler

  • respect existing env vars when loading .env (#239) (#249) by mikehostetler

  • typespec on object generation to allow zoi schemas (#208) by Kasun Vithanage

  • typespec for zoi schemas on object generation by Kasun Vithanage

Refactoring:

  • req_llm: move max_retries to request options by mikehostetler

  • req_llm: delegate model metadata to LLMDB; keep provider registry by mikehostetler