Changelog
View SourceAll notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.8.2 - 2025-12-07
Fixed
- Prevented silent model fallback and double endpoint suffixing by normalizing/validating models and using auth-aware streaming URLs.
- Stopped double-prefixing Vertex model paths; preserved fully qualified publisher/project models.
- Removed
idfrom function_response payloads to match API contract and avoid INVALID_ARGUMENT errors. - Guarded live files tests to skip gracefully unless Gemini API auth is present (Files API is Gemini-only).
Added
- Extended image_config to include Vertex-only fields (
output_mime_type,output_compression_quality) and updated serialization/tests. - Added regression tests for model path building and live smoke test for Gemini 3 image preview (writes to gitignored
generated/).
Changed
- Increased system instruction live test timeout to reduce flakiness under Vertex rate limiting.
- Improved AFC live test assertions to report errors clearly.
0.8.1 - 2025-12-06
Changed
- Updated README to reference v0.8.1 and generalize v0.8.x feature callouts.
0.8.0 - 2025-12-06
๐ Major Feature Release: Complete API Parity with Python SDK
This release brings the Elixir client to near-complete feature parity with the Python google-genai SDK, adding comprehensive support for Tunings (fine-tuning), FileSearchStores, Live/WebSocket API, Application Default Credentials (ADC), and Image/Video Generation APIs.
Added
๐๏ธ Tunings API - Model Fine-Tuning
Gemini.APIs.Tunings.tune/2: Create fine-tuning jobs with supervised learningGemini.APIs.Tunings.get/2: Get tuning job details and statusGemini.APIs.Tunings.list/1: List tuning jobs with paginationGemini.APIs.Tunings.list_all/1: List all tuning jobs across pagesGemini.APIs.Tunings.cancel/2: Cancel running tuning jobsGemini.APIs.Tunings.wait_for_completion/2: Wait for job completion with polling- Complete type system:
TuningJob,CreateTuningJobConfig,SupervisedTuningSpec,HyperParameters - Hyperparameter support: epoch_count, learning_rate_multiplier, adapter_size
๐ FileSearchStores API - Semantic Search
Gemini.APIs.FileSearchStores.create/2: Create semantic search storesGemini.APIs.FileSearchStores.get/2: Get store details and statusGemini.APIs.FileSearchStores.delete/2: Delete search storesGemini.APIs.FileSearchStores.list/1: List stores with paginationGemini.APIs.FileSearchStores.list_all/1: List all storesGemini.APIs.FileSearchStores.import_file/3: Import files into storesGemini.APIs.FileSearchStores.upload_to_store/3: Upload and index contentGemini.APIs.FileSearchStores.wait_for_active/2: Wait for store to become active
๐ Live/WebSocket API - Real-time Communication
Gemini.Live.Session: GenServer for WebSocket session managementGemini.Live.Session.connect/1: Establish WebSocket connectionsGemini.Live.Session.send/2: Send messages over WebSocketGemini.Live.Session.send_client_content/3: Send content with turn completionGemini.Live.Session.send_realtime_input/3: Send audio/video input streamsGemini.Live.Session.send_tool_response/3: Send tool/function responsesGemini.Live.Session.close/1: Gracefully close sessionsGemini.Live.Message: Message parsing and building utilities- Real-time streaming: Bidirectional communication for interactive applications
- Audio/video input: Support for real-time media streams
๐ Application Default Credentials (ADC) - GCP Native Auth
Gemini.Auth.ADC.load_credentials/0: Automatic credential discoveryGemini.Auth.ADC.get_access_token/2: Get OAuth2 tokens from various sourcesGemini.Auth.ADC.refresh_token/1: Refresh expired credentialsGemini.Auth.ADC.get_project_id/1: Extract project ID from credentials- Credential chain: GOOGLE_APPLICATION_CREDENTIALS โ gcloud config โ metadata server
Gemini.Auth.MetadataServer: GCE/Cloud Run metadata server integrationGemini.Auth.TokenCache: ETS-based token caching with TTL and automatic refresh- Zero-config GCP deployment: Just deploy and it works
๐ผ๏ธ Image Generation API - Imagen Models
Gemini.APIs.Images.generate/3: Generate images from text promptsGemini.APIs.Images.edit/5: Edit images with inpainting and masksGemini.APIs.Images.upscale/3: Upscale images (2x, 4x factors)ImageGenerationConfig: number_of_images, aspect_ratio, safety_filter_level, person_generationEditImageConfig: edit_mode, mask handling, reference imagesUpscaleImageConfig: upscale_factor, output settings- Safety filtering: Configurable content safety levels
- Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4
๐ฌ Video Generation API - Veo Models
Gemini.APIs.Videos.generate/3: Generate videos from prompts/imagesGemini.APIs.Videos.get_operation/2: Check video generation statusGemini.APIs.Videos.wait_for_completion/2: Wait for video completionGemini.APIs.Videos.cancel/2: Cancel video generationGemini.APIs.Videos.list_operations/1: List video operationsVideoGenerationConfig: duration_seconds, number_of_videos, aspect_ratio- Long-running operations: Proper handling of async video generation
- Source options: Text prompts, images, or existing videos
๐ New Documentation Guides
docs/guides/tunings.md- Complete fine-tuning guidedocs/guides/file_search_stores.md- Semantic search stores guidedocs/guides/live_api.md- Real-time WebSocket guidedocs/guides/adc.md- Application Default Credentials guidedocs/guides/image_generation.md- Image generation guidedocs/guides/video_generation.md- Video generation guide
Technical Implementation
๐๏ธ Architecture
- Vertex AI only: Tunings, FileSearchStores, Images, Videos are Vertex AI APIs
- WebSocket via :gun: HTTP/2 and WebSocket client for Live API
- ETS-based caching: Thread-safe token caching with automatic refresh
- Long-running operations: Proper polling with exponential backoff
- TypedStruct patterns: Consistent type definitions across all new modules
๐งช Testing
- 200+ new tests for all new modules
- Live API tests: Tagged with
@moduletag :live_apifor integration testing - Mox-based mocking: Proper HTTP mocking following supertester principles
- Zero Process.sleep: Proper OTP synchronization in all tests
๐ Quality
- Zero compilation warnings maintained
- Complete
@specannotations for all public functions - Comprehensive
@moduledocand@docdocumentation - Follows CODE_QUALITY.md standards
Dependencies
- Added
:gun~> 2.1 for WebSocket support (Live API)
Migration Notes
For Existing Users
All changes are additive - existing code continues to work unchanged. New APIs are available immediately:
# Fine-tune a model
{:ok, job} = Gemini.APIs.Tunings.tune(%{
base_model: "gemini-2.5-flash-001",
tuned_model_display_name: "my-tuned-model",
training_dataset_uri: "gs://bucket/training.jsonl"
}, auth: :vertex_ai)
# Create semantic search store
{:ok, store} = Gemini.APIs.FileSearchStores.create(%{
display_name: "Knowledge Base"
}, auth: :vertex_ai)
# Start real-time session
{:ok, session} = Gemini.Live.Session.start_link(
model: "gemini-2.0-flash-exp",
auth: :vertex_ai
)
# Generate images
{:ok, images} = Gemini.APIs.Images.generate(
"A sunset over mountains",
%ImageGenerationConfig{aspect_ratio: "16:9"},
auth: :vertex_ai
)
# Generate videos
{:ok, op} = Gemini.APIs.Videos.generate(
"A cat playing piano",
%VideoGenerationConfig{duration_seconds: 5},
auth: :vertex_ai
)ADC Auto-Discovery
With ADC support, credentials are automatically discovered:
# On GCE/Cloud Run - no configuration needed!
{:ok, response} = Gemini.generate("Hello", auth: :vertex_ai)
# Or with service account
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
{:ok, response} = Gemini.generate("Hello", auth: :vertex_ai)Gap Analysis Update
This release addresses the critical gaps identified in the v0.7.3 gap analysis:
- โ Tunings Module - 100% missing โ Now implemented
- โ FileSearchStores - 100% missing โ Now implemented
- โ Live/WebSocket API - 100% missing โ Now implemented
- โ ADC Support - Critical for GCP โ Now implemented
- โ Image Generation - Imagen models โ Now implemented
- โ Video Generation - Veo models โ Now implemented
Estimated parity with Python SDK: ~95% (up from ~85% in v0.7.3)
0.7.3 - 2025-12-06
Added
System Instruction Support
system_instructionoption: Set persistent system prompts that guide model behavior across conversations- Supports multiple formats: string, Content struct, or map with parts
- Reduces token usage compared to inline instructions in conversation history
- Works with all content generation operations
Enhanced Function Calling Framework
Gemini.Types.Schema: Complete JSON Schema type for defining function parameters- All standard types: string, integer, number, boolean, array, object
- Support for enum, format, minimum/maximum, pattern constraints
- Nested object and array schemas
- API format conversion with
to_api_map/1andfrom_api_map/1
Gemini.Tools.Executor: Execute function calls from Gemini responses- Function registry pattern for managing implementations
- Sequential execution with
execute_all/2 - Parallel execution with
execute_all_parallel/3for I/O-bound operations - Automatic response building with
build_responses/2 - Comprehensive error handling
Gemini.Tools.AutomaticFunctionCalling: Complete AFC loop implementation- Configurable with
max_calls,ignore_call_history,parallel_execution - Extract function calls from responses with
extract_function_calls/1 - Check for function calls with
has_function_calls?/1 - Full AFC loop with
loop/8for autonomous multi-step execution - Call history tracking
- Configurable with
Coordinator helpers:
extract_function_calls/1andhas_function_calls?/1convenience functions
Documentation
- New guide:
docs/guides/function_calling.md- Complete function calling guide - New guide:
docs/guides/system_instructions.md- System instruction usage guide - Added guides to Hex documentation
Comprehensive Gap Analysis
- Python SDK Comparison: Complete analysis of Python genai SDK (v1.53.0) vs Elixir gemini_ex implementation
- Executive Summary: High-level overview with severity classifications and recommendations
- Feature Parity Matrix: Detailed feature-by-feature comparison showing 55% current coverage
- Critical Gaps Document: In-depth analysis of 8 critical/high-priority gaps:
- Live/Real-time API (WebSocket) - Not implemented
- Tools/Function Calling - Types only, no execution
- Automatic Function Calling (AFC) - Not implemented
- System Instruction - Missing from request building
- Model Tuning API - Not implemented
- Grounding/Retrieval - Not implemented
- Code Execution Tool - Not implemented
- Image/Video Generation - Not implemented
- Implementation Priorities: Tiered roadmap with code examples for closing gaps
- Implementation Prompt: Detailed TDD-based prompt for implementing identified gaps
Documentation
New gap analysis documents in
docs/20251206/gap_analysis/:README.md- Navigation index and methodology00_executive_summary.md- High-level overview01_critical_gaps.md- Detailed critical gaps02_feature_parity_matrix.md- Complete feature comparison03_implementation_priorities.md- Implementation roadmapIMPLEMENTATION_PROMPT.md- TDD implementation guide
Added gap analysis documents to Hex documentation:
- New "Gap Analysis" group in documentation navigation
- All gap analysis docs included in package
Technical
- Analysis conducted using 21 parallel subagent deep-dive reports
- Covers all major Python SDK components:
- Client structure, Models API, Chat sessions
- Authentication, Streaming, Files API
- Context caching, Batch processing
- Type definitions, Tools/Function calling
- Safety settings, Embeddings, Live API
- Multimodal, Grounding, Async patterns
- Model tuning, Permissions, Pagination
- Error handling, Request/Response transformation
Quantitative Findings
| Metric | Python SDK | Elixir Port | Coverage |
|---|---|---|---|
| Total Lines (types) | 18,205 | ~3,000 | ~16% |
| API Modules | 12 | 7 | 58% |
| Type Definitions | 200+ | ~50 | ~25% |
| Overall Parity | - | - | 55% |
Recommended Priority Actions
- System Instruction (2-4 hours) - Quick win, high impact
- Function Calling Types (1 week) - Foundation for AI agents
- Function Execution (1 week) - Enable tool integration
- Automatic FC Loop (1 week) - Complete the agent loop
- Live API (3 weeks) - WebSocket for real-time apps
0.7.2 - 2025-12-06
Fixed
- Rate limiter race condition: Replaced
:global.trans/2with ETS-based spinlock using:ets.insert_new/2for proper single-node mutex semantics in bothConcurrencyGateandStatemodules - TOCTOU race in lock cleanup: Use
:ets.delete_object/2instead of:ets.delete/2to atomically delete only if PID still matches, preventing lock theft - ETS table options: Changed lock tables to use
write_concurrency: trueinstead ofread_concurrency: truefor write-heavy workloads - Test synchronization: Removed flaky
Process.sleepfrom atomic reservation test; now awaits non-blocking task2 before releasing task1 for deterministic synchronization
Changed
- Lock acquisition retry sleep increased from 1ms to 5ms to reduce CPU usage under contention
0.7.1 - 2025-12-05
Added
- Atomic token budget reservation (
try_reserve_budget/3) with safety multiplier, reconciliation, and telemetry events (budget_reserved,budget_rejected) - Shared retry window gating with jittered release plus telemetry hooks (
retry_window_set/hit/release) - Model use-case aliases (
cache_context,report_section,fast_path) resolved throughGemini.Config.model_for_use_case/2with documented token minima - Streaming now goes through the rate limiter (UnifiedManager): permits are held for the duration of the stream, budget is reserved up front, and telemetry is emitted for stream start/completion/error/stop
Fixed
- Concurrency gate TOCTOU race hardened with serialized permit acquisition; default non_blocking remains false for server workloads
- Rate limiter now pre-flight rejects over-budget bursts before dispatching requests and returns surplus budget after responses
0.7.0 - 2025-12-05
๐ Major Feature Release: Complete API Parity
This release brings the Elixir client to near-complete feature parity with the Python google-genai SDK, adding comprehensive support for Files, Batches, Operations, and Documents APIs.
Added
๐ Files API - Complete File Management
Gemini.APIs.Files.upload/2: Upload files with resumable protocol, progress tracking, and automatic MIME detectionGemini.APIs.Files.upload_data/2: Upload binary data directly (requiresmime_typeoption)Gemini.APIs.Files.get/2: Retrieve file metadata by nameGemini.APIs.Files.list/1: List files with pagination supportGemini.APIs.Files.list_all/1: Automatically paginate through all filesGemini.APIs.Files.delete/2: Delete uploaded filesGemini.APIs.Files.wait_for_processing/2: Poll until file is ready for useGemini.APIs.Files.download/2: Download generated file content
๐ฆ Batches API - Bulk Processing with 50% Cost Savings
Gemini.APIs.Batches.create/2: Create batch content generation jobsGemini.APIs.Batches.create_embeddings/2: Create batch embedding jobsGemini.APIs.Batches.get/2: Get batch job statusGemini.APIs.Batches.list/1: List batch jobs with paginationGemini.APIs.Batches.list_all/1: List all batch jobsGemini.APIs.Batches.cancel/2: Cancel running batch jobsGemini.APIs.Batches.delete/2: Delete batch jobsGemini.APIs.Batches.wait/2: Wait for batch completion with progress callbackGemini.APIs.Batches.get_responses/1: Extract inlined responses from completed batches- Support for file-based, inlined, GCS, and BigQuery input sources
โฑ๏ธ Operations API - Long-Running Task Management
Gemini.APIs.Operations.get/2: Get operation statusGemini.APIs.Operations.list/1: List operations with paginationGemini.APIs.Operations.list_all/1: List all operationsGemini.APIs.Operations.cancel/2: Cancel running operationsGemini.APIs.Operations.delete/2: Delete completed operationsGemini.APIs.Operations.wait/2: Wait with configurable pollingGemini.APIs.Operations.wait_with_backoff/2: Wait with exponential backoff
๐ Documents API - RAG Store Document Management
Gemini.APIs.Documents.get/2: Get document metadataGemini.APIs.Documents.list/2: List documents in a RAG storeGemini.APIs.Documents.list_all/2: List all documentsGemini.APIs.Documents.delete/2: Delete documentsGemini.APIs.Documents.wait_for_processing/2: Wait for document processingGemini.APIs.RagStores.get/2: Get RAG store metadataGemini.APIs.RagStores.list/1: List RAG storesGemini.APIs.RagStores.create/1: Create new RAG storesGemini.APIs.RagStores.delete/2: Delete RAG stores
๐ท๏ธ Enhanced Enum Types - Comprehensive Type Safety
New enum modules in Gemini.Types.Enums with to_api/1 and from_api/1 converters:
HarmCategory- 12 harm category valuesHarmBlockThreshold- 6 threshold levelsHarmProbability- 5 probability levelsBlockedReason- 7 block reasonsFinishReason- 12 finish reasonsTaskType- 9 embedding task typesFunctionCallingMode- 3 function calling modesDynamicRetrievalMode- 3 retrieval modesThinkingLevel- 3 thinking budget levelsCodeExecutionOutcome- 4 execution outcomesExecutableCodeLanguage- 2 code languagesGroundingAttributionConfidence- 4 confidence levelsAspectRatio- 4 image aspect ratiosImageSize- 3 image size optionsVoiceName- 6 voice options for TTS
๐ New Documentation Guides
docs/guides/files.md- Complete Files API guidedocs/guides/batches.md- Batch processing guidedocs/guides/operations.md- Long-running operations guide
Technical Implementation
๐๏ธ Architecture
- Resumable upload protocol with 8MB chunks and automatic retry
- Consistent polling patterns with configurable timeouts and progress callbacks
- TypedStruct patterns with
@derive Jason.Encoderfor all new types - Full multi-auth support (
:geminiand:vertex_ai) across all new APIs
๐งช Testing
- 94 new tests for Files, Operations, Batches, and Documents APIs
- Unit tests for all type parsing and helper functions
- Live API test infrastructure for integration testing
- Test fixtures for file uploads
๐ Quality
- Zero compilation warnings
- Complete
@specannotations for all public functions - Comprehensive
@moduledocand@docdocumentation - Follows CODE_QUALITY.md standards
Changed
- Updated README.md with new API sections and examples
- Version bump from 0.6.4 to 0.7.0
Migration Notes
For Existing Users
All changes are additive - existing code continues to work unchanged. New APIs are available immediately:
# Upload and use a file
{:ok, file} = Gemini.APIs.Files.upload("image.png")
{:ok, ready} = Gemini.APIs.Files.wait_for_processing(file.name)
{:ok, response} = Gemini.generate([
"Describe this image",
%{file_uri: ready.uri, mime_type: ready.mime_type}
])
# Create a batch job
{:ok, batch} = Gemini.APIs.Batches.create("gemini-2.0-flash",
file_name: "files/input123",
display_name: "My Batch"
)
{:ok, completed} = Gemini.APIs.Batches.wait(batch.name)
# Track long-running operations
{:ok, op} = Gemini.APIs.Operations.get("operations/abc123")
{:ok, completed} = Gemini.APIs.Operations.wait_with_backoff(op.name)0.6.4 - 2025-12-05
Added
Response Type Enhancements
UsageMetadatanow includes:thoughts_token_count- Token count for thinking models (Gemini 2.0+)tool_use_prompt_token_count- Tokens used in tool/function promptsprompt_tokens_details- Per-modality breakdown of prompt tokenscache_tokens_details- Per-modality breakdown of cached tokensresponse_tokens_details- Per-modality breakdown of response tokenstool_use_prompt_tokens_details- Per-modality breakdown of tool prompt tokenstraffic_type- Billing traffic type (ON_DEMAND, PROVISIONED_THROUGHPUT)
GenerateContentResponsenow includes:response_id- Unique response identifier for trackingmodel_version- Actual model version used (e.g., "gemini-2.0-flash-exp-001")create_time- Response creation timestamp
Candidatenow includes:finish_message- Human-readable message explaining stop reasonavg_logprobs- Average log probability score
PromptFeedbacknow includes:block_reason_message- Human-readable block explanation
Partnow includes:file_data- URI-based file references (alternative to inline_data)function_response- Function call response datathought- Boolean flag for thinking model thought parts
SafetyRatingnow includes:probability_score- Numeric harm probability (0.0-1.0)severity- Harm severity levelseverity_score- Numeric severity score
Request Type Enhancements
GenerationConfignow includes:seed- Deterministic generation seed for reproducible outputsresponse_modalities- Control output modalities (TEXT, IMAGE, AUDIO)speech_config- Audio output configuration with voice selectionmedia_resolution- Input media resolution control (LOW, MEDIUM, HIGH)
New Types
ModalityTokenCount- Per-modality token breakdownTrafficType- Billing traffic type enumModality- Response modality enum (TEXT, IMAGE, AUDIO)MediaResolution- Input media resolution enumFileData- URI-based file data structFunctionResponse- Function call response structSpeechConfig,VoiceConfig,PrebuiltVoiceConfig- Audio output configuration
Changed
- Response parsing now handles all new fields from Gemini API
- GenerationConfig encoding includes new fields when present
Fixed
- Token usage now correctly reports thinking tokens separately from output tokens
0.6.3 - 2025-12-05
Added
- Concurrency gate is now partitionable via
concurrency_key(e.g., per-tenant or per-location) instead of a single global queue per model. - Concurrency permit wait is configurable via
permit_timeout_ms; default is now:infinity(no queue drop). Per-call overrides supported. - Per-request timeout overrides for HTTP and streaming; global default HTTP/stream timeout raised to 120_000ms.
- Streaming knobs:
max_backoff_ms,connect_timeout, and configurable cleanup delay for ManagerV2 (config :gemini_ex, :streaming, cleanup_delay_ms: ...). - Configurable context cache TTL defaults via
config :gemini_ex, :context_cache, default_ttl_seconds: .... - Configurable retry delay fallback via
config :gemini_ex, :rate_limiter, default_retry_delay_ms: .... - Permit leak protection: holders are monitored and reclaimed if the process dies without releasing.
Changed
- Default HTTP/stream timeout increased from 30_000ms to 120_000ms.
- Concurrency gate uses configurable
permit_timeout_ms(default:infinity) instead of a fixed 60s timeout.
Fixed
- Streaming client no longer leaks
:persistent_termstate; SSE parse errors now surface instead of being silently dropped. - Streaming backoff ceiling and connect timeout are tunable; SSE parsing failures return errors.
0.6.2 - 2025-12-05
Fixed
- Eliminated recursive retry loop on
:over_budgetblocking calls; blocking now waits once for the current window to end, then retries through the normal pipeline. - Over-budget
retry_atis now set to the window end in non-blocking mode instead ofnil. - Requests whose estimated tokens exceed the configured budget return immediately with
request_too_large: trueinstead of hanging.
Added
estimated_cached_tokensoption for proactive budgeting with cached contexts; cached token usage (cachedContentTokenCount) is now included in recorded input tokens.- Telemetry for over-budget waits/errors now includes token estimates and wait metadata.
max_budget_wait_msconfig/option to cap how long blocking over-budget calls will sleep before returning arate_limitederror withretry_at.
Documentation
- README and rate limiting guide updated with over-budget behavior,
estimated_cached_tokens, and cached context budgeting notes.
0.6.1 - 2025-12-04
โ ๏ธ Potentially breaking (upgrade note): Token estimation now runs automatically and budget checks fall back to profile defaults. Apps that never set
:estimated_input_tokensor:token_budget_per_windowcan now receive local:over_budgeterrors. To preserve 0.6.0 behavior, settoken_budget_per_window: nil(globally or per-call), or disable the rate limiter.
Added
Proactive Rate Limiting Enhancements (ADR Implementation)
Auto Token Estimation (ADR-0001)
- Automatic input token estimation at the Coordinator boundary before request normalization
- Token estimates passed to rate limiter via
:estimated_input_tokensoption - Safe handling of API maps (
%{contents: [...]}) inTokens.estimate/1- returns 0 for unknown shapes instead of raising - Supports both atom keys (
:contents) and string keys ("contents")
Token Budget Configuration (ADR-0002)
- New
token_budget_per_windowconfig field with conservative defaults - New
window_duration_msconfig field (default: 60,000ms) - Budget checking falls back to
config.token_budget_per_windowwhen not in per-request opts State.record_usage/4now accepts configurable window duration via opts
- New
Enhanced 429 Propagation (ADR-0003)
- Retry state now captures
quota_dimensionsandquota_valuefrom 429 responses - Enhanced quota metric extraction from nested error details
- Retry state now captures
Tier-Based Rate Limit Profiles (ADR-0004)
- New
:free_tierprofile - Conservative for 15 RPM / 1M TPM (32,000 token budget) - New
:paid_tier_1profile - Standard production 500 RPM / 4M TPM (1,000,000 token budget) - New
:paid_tier_2profile - High throughput 1000 RPM / 8M TPM (2,000,000 token budget) - Updated
:devand:prodprofiles with token budgets - Profile type expanded to include all tier options
- New
Changed
RateLimiter.Configstruct now includestoken_budget_per_windowandwindow_duration_msfieldsManager.check_token_budget/3now falls back to config defaultsManager.record_usage_from_response/3passes window duration from config to State- Updated
docs/guides/rate_limiting.mdwith comprehensive tier documentation
Documentation
- Added Quick Start section with tier profile selection table
- Expanded Profiles section with all tier configurations
- Enhanced Token Budgeting section explaining automatic estimation
- Added Fine-Tuning section for concurrency vs token budget guidance
0.6.0 - 2025-12-04
Added
Context Caching Enhancements
- Cache creation now supports
system_instructionparameter for setting system-level instructions that apply to all cached content usage - Cache creation now supports
toolsparameter for caching function declarations alongside content - Cache creation now supports
tool_configparameter for configuring function calling behavior in cached contexts - Cache creation now supports
fileUriin content parts for caching files stored in Google Cloud Storage (gs:// URIs) - Cache creation now supports
kms_key_nameparameter for customer- managed encryption keys (Vertex AI only) - Resource name normalization for Vertex AI automatically expands short cache names like "cachedContents/abc" to fully qualified paths like "projects/{project}/locations/{location}/cachedContents/abc"
- Model name normalization for Vertex AI automatically expands model names to full publisher paths
- Top-level cache API delegates added to main Gemini module:
Gemini.create_cache/2- Create cached contentGemini.list_caches/1- List all cached contentsGemini.get_cache/2- Retrieve cached content by nameGemini.update_cache/2- Update cache TTL or expirationGemini.delete_cache/2- Delete cached content
CachedContentUsageMetadatastruct expanded with Vertex AI specific fields:audio_duration_seconds,image_count,text_count, andvideo_duration_seconds- Model validation warning when using models that may not support explicit caching (models without version suffixes)
- Live test covering
system_instructionwithfileUricaching
- Cache creation now supports
Auth-Aware Model Configuration System
- Model registry organized by API compatibility:
- Universal models work identically in both Gemini API and Vertex AI
- Gemini API models include convenience aliases like
-latestsuffix - Vertex AI models include EmbeddingGemma variants
- Config.default_model/0 automatically selects appropriate model based
on detected authentication:
- Gemini API:
gemini-flash-lite-latest - Vertex AI:
gemini-2.0-flash-lite
- Gemini API:
- Config.default_embedding_model/0 selects embedding model by auth:
- Gemini API:
gemini-embedding-001(3072 dimensions) - Vertex AI:
embeddinggemma(768 dimensions)
- Gemini API:
- Config.default_model_for/1 and Config.default_embedding_model_for/1 for explicit API type selection
- Config.models_for/1 returns all models available for a specific API
- Config.model_available?/2 checks if a model key works with an API
- Config.model_api/1 returns the API compatibility of a model key
- Config.current_api_type/0 returns detected auth type
- Embedding configuration system with per-model settings:
- Config.embedding_config/1 returns full config for embedding models
- Config.uses_prompt_prefix?/1 checks if model uses prompt prefixes
- Config.embedding_prompt_prefix/2 generates task-specific prefixes
- Config.default_embedding_dimensions/1 returns model default dims
- Config.needs_normalization?/2 checks if manual normalization needed
- EmbeddingGemma support with automatic prompt prefix formatting for task types (retrieval_query becomes "task: search result | query: ")
- Model registry organized by API compatibility:
Test Infrastructure
Gemini.Test.ModelHelpersmodule for centralized model referencesGemini.Test.AuthHelpersmodule for shared auth detection logic- Helper functions:
auth_available?/0,gemini_api_available?/0,vertex_api_available?/0,default_model/0,embedding_model/0,thinking_model/0,caching_model/0,universal_model/0
Changed
Auth.build_headers/2now returns{:ok, headers}or{:error, reason}instead of always returning headers, enabling proper error propagationGemini.configure/2now stores config under:geminiapp environment to align with Config.auth_config/0 which reads from both :gemini and:gemini_exnamespacesEmbedContentRequest.new/2automatically formats text with prompt prefixes when using EmbeddingGemma models on Vertex AI- All example scripts updated to use
Config.default_model()instead of hardcoded model strings - All tests updated to use auth-aware model selection via ModelHelpers
- Config module default model comment updated to explain auto-detection
Fixed
- Vertex AI Cache Endpoints: Cache operations now build fully qualified
paths (
projects/{project}/locations/{location}/cachedContents) instead of calling/cachedContentsdirectly, which was causing 404 errors - Config Alignment:
Gemini.configure/2now properly feeds config to Config.auth_config/0 by using the correct app environment key - Service Account Auth: Removed placeholder tokens that masked real authentication failures; errors now propagate properly with descriptive messages
- JWT Token Exchange: Fixed OAuth2 JWT payload to include scope in the JWT claims as required by Google's jwt-bearer grant type specification
- Content Formatting: Part formatting now handles function calls, function responses, thought signatures, file data, and media resolution correctly instead of leaving them in snake_case struct format
- Empty Env Vars: Environment variable reading now treats empty strings
as unset, preventing configuration issues with
GEMINI_API_KEY="" - ContextCache.create/2: Now accepts string content directly in addition to lists, matching README documentation examples
- Model Prefix Handling: Model name normalization no longer double-
prefixes when callers pass
models/...format
Documentation
- README updated with enhanced context caching examples showing system_instruction, fileUri, and model selection
- README includes new Model Configuration System section explaining auth-aware defaults and API differences
- README includes embedding model differences table
- Config module documentation expanded with model registry explanation
- Implementation plan documents added in docs/20251204/
0.5.2 - 2025-12-03
Fixed
- Fixed a regression where 429 responses lost their
http_status, causing the rate limiter to misclassify them as permanent errors. API errors now preserve status and RetryInfo details so automatic backoff/RetryInfo delays are honored by default.
0.5.1 - 2025-12-03
Added
Gemini 3 Pro Support
Full support for Google's Gemini 3 model family with new API features:
thinking_levelParameter - New thinking control for Gemini 3 modelsGenerationConfig.thinking_level(:low)- Fast responses, minimal reasoningGenerationConfig.thinking_level(:high)- Deep reasoning (default for Gemini 3)- Note:
:mediumis not currently supported by the API - Cannot be used with
thinking_budgetin the same request (API returns 400)
gemini-3-pro-image-previewModel - Image generation support- Generate images from text prompts
- Configurable aspect ratios: "16:9", "1:1", "4:3", "3:4", "9:16"
- Output resolutions: "2K" or "4K"
GenerationConfig.image_config(aspect_ratio: "16:9", image_size: "4K")
media_resolutionParameter - Fine-grained vision processing control:low- 280 tokens for images, 70 for video frames:medium- 560 tokens for images, 70 for video frames:high- 1120 tokens for images, 280 for video framesPart.inline_data_with_resolution(data, mime_type, :high)Part.with_resolution(existing_part, :high)
thought_signatureField - Reasoning context preservation- Maintains reasoning context across API calls
- Required for multi-turn function calling in Gemini 3
Part.with_thought_signature(part, signature)- SDK handles automatically in chat sessions
- NEW: Automatic extraction via
Gemini.extract_thought_signatures/1 - NEW: Automatic echoing in
Chat.add_model_response/2
Context Caching API - Cache long context for improved performance
Gemini.APIs.ContextCache.create/2- Create cached contentGemini.APIs.ContextCache.list/1- List cached contentsGemini.APIs.ContextCache.get/2- Get specific cacheGemini.APIs.ContextCache.update/2- Update cache TTLGemini.APIs.ContextCache.delete/2- Delete cache- Use with
cached_content: "cachedContents/id"in generate requests - Minimum 4096 tokens required for caching
New Example:
examples/gemini_3_demo.exs- Comprehensive Gemini 3 features demonstration
Updated Validation
Gemini.Validation.ThinkingConfignow validates Gemini 3'sthinking_level- Prevents combining
thinking_levelandthinking_budget(API constraint) - Warns that
:mediumthinking level is not supported
Changed
Embeddings Documentation Updates
Fixed EMBEDDINGS.md: Corrected code examples and removed outdated/confusing information
- Fixed incorrect module reference (
Coordinator.EmbedContentResponseโEmbedContentResponse) - Removed confusing legacy model section (there's only
gemini-embedding-001now) - Updated model comparison to reflect current API (single model with MRL support)
- Updated async batch section with working code examples (was marked as "planned")
- Added deprecation notice for
embedding-001,embedding-gecko-001,gemini-embedding-exp-03-07(October 2025)
- Fixed incorrect module reference (
Updated embed_content_request.ex: Removed deprecated model reference from documentation
Fixed
- Documentation now accurately reflects the current Gemini Embeddings API specification (June 2025)
- Clarified that
gemini-embedding-001is the only recommended model with full MRL support
Migration Notes
For Gemini 3 Users
# Use thinking_level instead of thinking_budget for Gemini 3
config = GenerationConfig.thinking_level(:low) # Fast
config = GenerationConfig.thinking_level(:high) # Deep reasoning (default)
# Image generation
config = GenerationConfig.image_config(aspect_ratio: "16:9", image_size: "4K")
{:ok, response} = Coordinator.generate_content(
"Generate an image of a sunset",
model: "gemini-3-pro-image-preview",
generation_config: config
)
# Media resolution for vision tasks
Part.inline_data_with_resolution(image_data, "image/jpeg", :high)Temperature Recommendation
For Gemini 3, keep temperature at 1.0 (the default). Lower temperatures may cause looping or degraded performance on complex reasoning tasks.
0.5.0 - 2025-12-03
Added
Rate Limiting System (Default ON)
A comprehensive rate limiting, retry, and concurrency management system that is enabled by default:
RateLimitManager - Central coordinator that wraps all outbound requests
- ETS-based state tracking keyed by
{model, location, metric} - Tracks
retry_untiltimestamps from 429 RetryInfo responses - Token usage sliding windows for budget estimation
- Configurable via application config or per-request options
- ETS-based state tracking keyed by
ConcurrencyGate - Per-model concurrency limiting
- Default limit of 4 concurrent requests per model
- Configurable with
max_concurrency_per_model(nil/0 disables) - Optional adaptive mode: adjusts concurrency based on 429 responses
- Non-blocking mode returns immediately if no permits available
RetryManager - Intelligent retry with backoff
- Honors 429 RetryInfo.retryDelay from API responses
- Exponential backoff with jitter for 5xx/transient errors
- Configurable max attempts (default: 3)
- Coordinates with rate limiter to avoid double retries
TokenBudget - Preflight token estimation
- Track actual usage from responses
- Block/queue when over configured budget
- Sliding window tracking per model/location
Telemetry Events
New telemetry events for rate limit monitoring (consistent with existing [:gemini, ...] namespace):
[:gemini, :rate_limit, :request, :start]- Request submitted[:gemini, :rate_limit, :request, :stop]- Request completed[:gemini, :rate_limit, :wait]- Waiting for retry window[:gemini, :rate_limit, :error]- Rate limit error
Structured Errors
New structured error types:
{:error, {:rate_limited, retry_at, details}}- Rate limited with retry info{:error, {:transient_failure, attempts, original_error}}- Transient failure after retries
Configuration Options
config :gemini_ex, :rate_limiter,
max_concurrency_per_model: 4, # nil/0 disables
max_attempts: 3,
base_backoff_ms: 1000,
jitter_factor: 0.25,
non_blocking: false,
disable_rate_limiter: false,
adaptive_concurrency: false,
adaptive_ceiling: 8,
profile: :prod # :dev | :prod | :customPer-Request Options
disable_rate_limiter: true- Bypass all rate limitingnon_blocking: true- Return immediately if rate limitedmax_concurrency_per_model: N- Override concurrencyestimated_input_tokens: N- For budget checkingtoken_budget_per_window: N- Max tokens per window
Documentation
- New rate limiting guide:
docs/guides/rate_limiting.md - Comprehensive module documentation for all rate limiter components
- Updated README with rate limiting section
Changed
- HTTP client now routes all requests through rate limiter by default
- Supervisor now starts RateLimitManager on application boot
Technical Notes
- Streaming Safe: Rate limiter only gates request submission; open streams are not interrupted
- Coordinate Retry Layers: Retry logic coordinates between rate limiter and HTTP client to avoid double retries
- Test Infrastructure: Added Bypass-based fake Gemini endpoint for testing rate limit behavior
Migration Guide
Rate limiting is enabled by default. To disable:
# Per-request
Gemini.generate("Hello", disable_rate_limiter: true)
# Globally (not recommended)
config :gemini_ex, :rate_limiter, disable_rate_limiter: trueThe new structured errors are backward compatible - existing error handling will continue to work, but you can now pattern match on rate limit specifics:
case Gemini.generate("Hello") do
{:ok, response} -> handle_success(response)
{:error, {:rate_limited, retry_at, _}} -> schedule_retry(retry_at)
{:error, other} -> handle_error(other)
end0.4.0 - 2025-11-06
Added
- Structured Outputs Enhancement - Full support for Gemini API November 2025 updates
property_orderingfield inGenerationConfigfor Gemini 2.0 model supportstructured_json/2convenience helper for structured output setupproperty_ordering/2helper for explicit property orderingtemperature/2helper for setting temperature values- Support for new JSON Schema keywords:
anyOf- Union types and conditional structures$ref- Recursive schema definitionsminimum/maximum- Numeric value constraintsadditionalProperties- Control over extra propertiestype: "null"- Nullable field definitionsprefixItems- Tuple-like array structures
- Comprehensive integration tests for structured outputs
- Working examples demonstrating all new features
Improved
- Enhanced documentation for structured outputs use cases
- Better code examples in README and API reference
- Expanded test coverage for generation config options
Notes
- Gemini 2.5+ models preserve schema key order automatically
- Gemini 2.0 models require explicit
property_orderingfield - All changes are backward compatible - no breaking changes
0.3.1 - 2025-10-15
๐ Major Feature: Async Batch Embedding API (Phase 4)
This release adds production-scale async batch embedding support with 50% cost savings compared to the interactive API. Process thousands to millions of embeddings asynchronously with Long-Running Operation (LRO) support, state tracking, and priority management.
Added
๐ Async Batch Embedding API
async_batch_embed_contents/2: Submit large batches asynchronously for background processing- 50% cost savings vs interactive embedding API
- Suitable for RAG system indexing, knowledge base building, and large-scale retrieval
- Returns immediately with batch ID for polling
- Support for inline requests with metadata tracking
get_batch_status/1: Poll batch job status with progress tracking- Real-time progress metrics via
EmbedContentBatchStats - State transitions: PENDING โ PROCESSING โ COMPLETED/FAILED
- Track successful, failed, and pending request counts
- Real-time progress metrics via
get_batch_embeddings/1: Retrieve results from completed batch jobs- Extract embeddings from inline responses
- Support for file-based output detection
- Automatic filtering of successful responses
await_batch_completion/2: Convenience polling with configurable intervals- Automatic polling until completion or timeout
- Progress callback support for monitoring
- Configurable poll interval and timeout
๐ Complete Type System
BatchState: Job state enum (:unspecified,:pending,:processing,:completed,:failed,:cancelled)EmbedContentBatchStats: Request tracking with progress metricsprogress_percentage/1: Calculate completion percentagesuccess_rate/1andfailure_rate/1: Quality metricsis_complete?/1: Completion check
Request Types:
InlinedEmbedContentRequest: Single request with metadataInlinedEmbedContentRequests: Container for multiple requestsInputEmbedContentConfig: Union type for file vs inline inputEmbedContentBatch: Complete batch job request with priority
Response Types:
InlinedEmbedContentResponse: Single response with success/errorInlinedEmbedContentResponses: Container with helper functionsEmbedContentBatchOutput: Union type for file vs inline outputEmbedContentBatch: Complete batch status with lifecycle tracking
๐งช Comprehensive Test Coverage
- 41 new unit tests for batch types (BatchState, BatchStats)
- Full TDD approach with test-first implementation
- 425 total tests passing (up from 384 in v0.3.0)
- Zero compilation warnings maintained
Technical Implementation
๐ฏ Production Features
- Long-Running Operations (LRO): Full async job lifecycle support
- Priority-based Processing: Control batch execution order with priority field
- Progress Tracking: Real-time stats on successful, failed, and pending requests
- Multi-auth Support: Works with both Gemini API and Vertex AI
- Type Safety: Complete
@specannotations for all new functions - Error Handling: Comprehensive error messages and recovery paths
๐ Performance & Cost
- 50% cost savings: Async batch API offers half the cost of interactive embedding
- Scalability: Process millions of embeddings efficiently
- Production-ready: Designed for large-scale RAG systems and knowledge bases
- Flexible polling: Configurable intervals (default 5s) with timeout (default 10min)
Usage Examples
# Submit async batch for background processing
{:ok, batch} = Gemini.async_batch_embed_contents(
["Text 1", "Text 2", "Text 3"],
display_name: "My Knowledge Base",
task_type: :retrieval_document,
output_dimensionality: 768
)
# Poll for status
{:ok, updated_batch} = Gemini.get_batch_status(batch.name)
# Check progress
if updated_batch.batch_stats do
progress = updated_batch.batch_stats |> EmbedContentBatchStats.progress_percentage()
IO.puts("Progress: #{Float.round(progress, 1)}%")
end
# Wait for completion (convenience function)
{:ok, completed_batch} = Gemini.await_batch_completion(
batch.name,
poll_interval: 10_000, # 10 seconds
timeout: 1_800_000, # 30 minutes
on_progress: fn b ->
progress = EmbedContentBatchStats.progress_percentage(b.batch_stats)
IO.puts("Progress: #{Float.round(progress, 1)}%")
end
)
# Retrieve embeddings
{:ok, embeddings} = Gemini.get_batch_embeddings(completed_batch)
IO.puts("Retrieved #{length(embeddings)} embeddings")Changed
- Enhanced
Coordinatormodule: Added async batch embedding functions alongside existing sync APIs - Type system expansion: New types in
Gemini.Types.RequestandGemini.Types.Responsenamespaces
Migration Notes
For v0.3.0 Users
All existing synchronous embedding APIs remain unchanged and fully compatible
New async batch API is additive - no breaking changes
Use async batch API for:
- Large-scale embedding generation (1000s-millions of texts)
- Background processing with 50% cost savings
- RAG system indexing and knowledge base building
- Non-time-critical embedding workflows
Continue using sync API (
embed_content/2,batch_embed_contents/2) for:- Real-time embedding needs
- Small batches (<100 texts)
- Interactive workflows requiring immediate results
Future Enhancements
- File-based batch input/output support (GCS integration)
- Batch cancellation and deletion APIs
- Enhanced progress monitoring with estimated completion times
Related Documentation
- API Specification:
oldDocs/docs/spec/GEMINI-API-07-EMBEDDINGS_20251014.md(lines 129-442) - Implementation Plan:
EMBEDDING_IMPLEMENTATION_PLAN.md(Phase 4 section)
0.3.0 - 2025-10-14
๐ Major Feature: Complete Embedding Support with MRL
This release adds comprehensive text embedding functionality with Matryoshka Representation Learning (MRL), enabling powerful semantic search, RAG systems, classification, and more.
Added
๐ Embedding API with Normalization & Distance Metrics
ContentEmbedding.normalize/1: L2 normalization to unit length (required for non-3072 dimensions per API spec)ContentEmbedding.norm/1: Calculate L2 norm of embedding vectorsContentEmbedding.euclidean_distance/2: Euclidean distance metric for similarityContentEmbedding.dot_product/2: Dot product similarity (equals cosine for normalized embeddings)- Enhanced
cosine_similarity/2: Improved documentation with normalization requirements
๐ฌ Production-Ready Use Case Examples
examples/use_cases/mrl_normalization_demo.exs: Comprehensive MRL demonstration- Quality vs storage tradeoffs across dimensions (128-3072)
- MTEB benchmark comparison table
- Normalization requirements and effects
- Distance metrics comparison (cosine, euclidean, dot product)
- Best practices for dimension selection
examples/use_cases/rag_demo.exs: Complete RAG pipeline implementation- Build and index knowledge base with RETRIEVAL_DOCUMENT task type
- Embed queries with RETRIEVAL_QUERY task type
- Retrieve top-K relevant documents using semantic similarity
- Generate contextually-aware responses
- Side-by-side comparison with non-RAG baseline
examples/use_cases/search_reranking.exs: Semantic reranking for search- E-commerce product search example
- Compare keyword vs semantic ranking
- Hybrid ranking strategy (keyword + semantic weighted)
- Handle synonyms and conceptual relevance
examples/use_cases/classification.exs: K-NN classification- Few-shot learning with minimal training examples
- Customer support ticket categorization
- Confidence scoring and accuracy evaluation
- Dynamic category addition without retraining
๐ Enhanced Documentation
Complete MRL documentation in
examples/EMBEDDINGS.md:- Matryoshka Representation Learning explanation
- MTEB benchmark scores table (128d to 3072d)
- Normalization requirements and best practices
- Model comparison table (gemini-embedding-001 vs gemini-embedding-001)
- Critical normalization warnings
- Distance metrics usage guide
README.md embeddings section:
- Quick start guide for embeddings
- MRL concepts and dimension selection
- Task types for better quality
- Batch embedding examples
- Links to advanced use case examples
๐งช Comprehensive Test Coverage
26 unit tests for
ContentEmbeddingmodule:- Normalization accuracy (L2 norm = 1.0)
- Distance metrics validation
- Edge cases and error handling
- Zero vector handling
20 integration tests for embedding coordinator:
- Single and batch embedding workflows
- Task type variations
- Output dimensionality control
- Error scenarios
Technical Implementation
๐ฏ Key Features
MRL Support: Flexible dimensions (128-3072) with minimal quality loss
- 768d: 67.99 MTEB (25% storage, -0.26% loss) - RECOMMENDED
- 1536d: 68.17 MTEB (50% storage, same as 3072d!)
- 3072d: 68.17 MTEB (100% storage, pre-normalized)
Critical Normalization: Only 3072-dimensional embeddings are pre-normalized by API
- All other dimensions MUST be normalized before computing similarity
- Cosine similarity focuses on direction (semantic meaning), not magnitude
- Non-normalized embeddings have varying magnitudes that distort calculations
Production Quality: 384 tests passing (100% success rate)
Type Safety: Complete
@specannotations for all new functionsCode Quality: Zero compilation warnings maintained
๐ Performance Characteristics
- Storage Efficiency: 768d offers 75% storage savings with <0.3% quality loss
- Quality Benchmarks: MTEB scores prove minimal degradation across dimensions
- Real-time Processing: Efficient normalization and distance calculations
Changed
- Updated README.md: Added embeddings section in features list and comprehensive usage guide
- Enhanced EMBEDDINGS.md: Complete rewrite with MRL documentation and advanced examples
- Model Recommendations: Updated to highlight
gemini-embedding-001with MRL support
Migration Notes
For New Users
# Generate embedding with recommended 768 dimensions
{:ok, response} = Gemini.embed_content(
"Your text",
model: "gemini-embedding-001",
output_dimensionality: 768
)
# IMPORTANT: Normalize before computing similarity!
alias Gemini.Types.Response.ContentEmbedding
normalized = ContentEmbedding.normalize(response.embedding)
similarity = ContentEmbedding.cosine_similarity(normalized, other_normalized)Dimension Selection Guide
- 768d: Best for most applications (storage/quality balance)
- 1536d: High quality at 50% storage (same MTEB as 3072d)
- 3072d: Maximum quality, pre-normalized (largest storage)
- 512d or lower: Extreme efficiency (>1% quality loss)
Future Roadmap
v0.4.0 (Planned): Async Batch Embedding API
- Long-running operations (LRO) support
- 50% cost savings vs interactive embedding
- Batch state tracking and priority support
Related Documentation
- Comprehensive Guide:
examples/EMBEDDINGS.md - MRL Demo:
examples/use_cases/mrl_normalization_demo.exs - RAG Example:
examples/use_cases/rag_demo.exs - API Specification:
oldDocs/docs/spec/GEMINI-API-07-EMBEDDINGS_20251014.md
0.2.3 - 2025-10-08
Fixed
- CRITICAL: Double-encoding bug in multimodal content - Fixed confusing base64 encoding behavior (Issue #11 comment from @jaimeiniesta)
- Problem: When users passed
Base.encode64(image_data)withtype: "base64", data was encoded AGAIN internally, causing double-encoding - Symptom: Users had to pass raw (non-encoded) data despite specifying
type: "base64", which was confusing and counterintuitive - Root cause:
Blob.new/2always calledBase.encode64(), even when data was already base64-encoded - Fix: When
source: %{type: "base64", data: ...}is specified, data is now treated as already base64-encoded - Impact:
- โ
Users can now pass
Base.encode64(data)as expected (documentation examples now work correctly) - โ
API behavior matches user expectations:
type: "base64"means data IS base64-encoded - โ
Applies to both Anthropic-style format (
%{type: "image", source: %{type: "base64", ...}}) and Gemini SDK style (%{inline_data: %{data: ..., mime_type: ...}}) - โ ๏ธ Breaking change for workarounds: If you were passing raw (non-encoded) data as a workaround, you must now pass properly base64-encoded data
- โ
Users can now pass
- Special thanks to @jaimeiniesta for reporting this confusing behavior!
- Problem: When users passed
Changed
- Enhanced
normalize_single_content/1to preserve base64 data without re-encoding whentype: "base64" - Enhanced
normalize_part/1to preserve base64 data ininline_datamaps - Updated tests to verify correct base64 handling
- Added demonstration script:
examples/fixed_double_encoding_demo.exs
0.2.2 - 2025-10-07
Added
Flexible multimodal content input - Accept multiple intuitive input formats for images and text (Closes #11)
- Support Anthropic-style format:
%{type: "text", text: "..."}and%{type: "image", source: %{type: "base64", data: "..."}} - Support map format with explicit role and parts:
%{role: "user", parts: [...]} - Support simple string inputs:
"What is this?" - Support mixed formats in single request
- Automatic MIME type detection from image magic bytes (PNG, JPEG, GIF, WebP)
- Graceful fallback to explicit MIME type or JPEG default
- Support Anthropic-style format:
Thinking budget configuration - Control thinking token usage for cost optimization (Closes #9, Supersedes #10)
GenerationConfig.thinking_budget/2- Set thinking token budget (0 to disable, -1 for dynamic, or fixed amount)GenerationConfig.include_thoughts/2- Enable thought summaries in responsesGenerationConfig.thinking_config/3- Set both budget and thoughts in one callGemini.Validation.ThinkingConfigmodule - Model-aware budget validation- Support for all Gemini 2.5 series models (Pro, Flash, Flash Lite)
Fixed
Multimodal content handling - Users can now pass images and text in natural, intuitive formats
- Previously: Only accepted specific
Contentstructs, causingFunctionClauseError - Now: Accepts flexible formats and automatically normalizes them
- Backward compatible: All existing code continues to work
- Previously: Only accepted specific
CRITICAL: Thinking budget field names - Fixed PR #10's critical bug that prevented thinking budget from working
- Previously: Sent
thinking_budget(snake_case) which API silently ignored, users still charged - Now: Sends
thinkingBudget(camelCase) as required by official API, actually disables thinking - Added
includeThoughtssupport that was missing from PR #10 - Added model-specific budget validation (Pro: 128-32K, Flash: 0-24K, Lite: 0 or 512-24K)
- Note: This supersedes PR #10 with a correct, fully-tested implementation
- Previously: Sent
Changed
- Enhanced
Coordinator.generate_content/2to accept flexible content formats - Added automatic content normalization layer
- Added
convert_thinking_config_to_api/1to properly convert field names to camelCase GenerationConfig.ThinkingConfigis now a typed struct (not plain map)
[Unreleased]
0.2.1 - 2025-08-08
Added
- ALTAR Integration Documentation: Added detailed documentation for the
ALTARprotocol integration, explaining the architecture and benefits of the new type-safe, production-grade tool-calling foundation. - ALTAR Version Update: Bumped ALTAR dependency to v0.1.2.
0.2.0 - 2025-08-07
๐ Major Feature: Automatic Tool Calling
This release introduces a complete, production-grade tool-calling (function calling) feature set, providing a seamless, Python-SDK-like experience for building powerful AI agents. The implementation is architected on top of the robust, type-safe ALTAR protocol for maximum reliability and future scalability.
Added
๐ค Automatic Tool Execution Engine
- New Public API:
Gemini.generate_content_with_auto_tools/2orchestrates the entire multi-turn tool-calling loop. The library now automatically detects when a model wants to call a tool, executes it, sends the result back, and returns the final, synthesized text response. - Recursive Orchestrator: A resilient, private orchestrator manages the conversation, preventing infinite loops with a configurable
:turn_limit. - Streaming Support:
Gemini.stream_generate_with_auto_tools/2provides a fully automated tool-calling experience for streaming. A newToolOrchestratorGenServer manages the complex, multi-stage stream, ensuring the end-user only receives the final text chunks.
๐ง Manual Tool Calling Foundation (For Advanced Users)
- New
Gemini.ToolsFacade: Provides a clean, high-level API (register/2,execute_calls/1) for developers who need full control over the tool-calling loop. - Parallel Execution:
Gemini.Tools.execute_calls/1usesTask.async_streamto execute multiple tool calls from the model in parallel, improving performance. - Robust Error Handling: Individual tool failures are captured as a valid
ToolResultand do not crash the calling process.
๐๏ธ Architectural Foundation (ALTAR Integration)
- ALTAR Dependency: The project now builds upon the
altarlibrary, using its robust Data Model (ADM) and Local Execution Runtime (LATER). - Supervised
Registry:gemini_exnow starts and supervises its own namedAltar.LATER.Registryprocess (Gemini.Tools.Registry), providing a stable, application-wide endpoint for tool management. - Formalized
Gemini.ChatModule: The chat history management has been completely refactored into a newGemini.Chatstruct and module, providing immutable, type-safe handling of complex multi-turn histories that includefunction_callandfunction_responseturns.
Changed
PartStruct: TheGemini.Types.Partstruct was updated to include afunction_callfield, enabling type-safe parsing of model responses.- Response Parsing: The core response parser in
Gemini.Generatehas been significantly enhanced to safely deserializefunctionCallparts from the API, validating them against theAltar.ADMcontract. - Chat History: The
Gemini.send_message/2function has been refactored to use the new, more powerfulGemini.Chatmodule.
Fixed
- CRITICAL: Tool Response Role: The role for
functionResponseturns sent to the API is now correctly set to"tool"(was"user"), ensuring API compatibility. - Architectural Consistency: Removed an erroneous
function_responsefield from thePartstruct.functionResponseparts are now correctly handled as raw maps, consistent with the library's design. - Test Consistency: Updated all relevant tests to use
camelCasestring keys when asserting against API-formatted data structures, improving test accuracy.
๐ Documentation & Examples
- New Example (
auto_tool_calling_demo.exs): A comprehensive script demonstrating how to register multiple tools and use the new automatic execution APIs for both standard and streaming requests. - New Example (
manual_tool_calling_demo.exs): A clear demonstration of the advanced, step-by-step manual tool-calling loop.
0.1.1 - 2025-08-03
๐ Fixed
Generation Config Bug Fix
- Critical Fix: Fixed
GenerationConfigoptions being dropped inGemini.APIs.Coordinatormodule- Previously, only 4 basic options (
temperature,max_output_tokens,top_p,top_k) were supported - Now supports all 12
GenerationConfigfields includingresponse_schema,response_mime_type,stop_sequences, etc. - Fixed inconsistency between
Gemini.GenerateandGemini.APIs.Coordinatormodules - Both modules now handle generation config options identically
- Previously, only 4 basic options (
Enhanced Generation Config Support
- Complete Field Coverage: Added support for all missing
GenerationConfigfields:response_schema- For structured JSON outputresponse_mime_type- For controlling output formatstop_sequences- For custom stop sequencescandidate_count- For multiple response candidatespresence_penalty- For controlling topic repetitionfrequency_penalty- For controlling word repetitionresponse_logprobs- For response probability logginglogprobs- For token probability information
Improved Request Building
- Struct Priority:
GenerationConfigstructs now take precedence over individual keyword options - Key Conversion: Proper snake_case to camelCase conversion for all API fields
- Nil Filtering: Automatic filtering of nil values to reduce request payload size
- Backward Compatibility: Existing code using individual options continues to work unchanged
๐งช Testing
Comprehensive Test Coverage
- 70 New Tests: Added extensive test suite covering all generation config scenarios
- Bug Reproduction: Tests that demonstrate the original bug and verify the fix
- Field Coverage: Individual tests for each of the 12 generation config fields
- Integration Testing: End-to-end tests with real API request structure validation
- Regression Prevention: Tests ensure the bug cannot reoccur in future versions
Test Categories Added
- Individual option handling tests
- GenerationConfig struct handling tests
- Mixed option scenarios (struct + individual options)
- Edge case handling (nil values, invalid types)
- API request structure validation
- Backward compatibility verification
๐ง Technical Improvements
Code Quality
- Helper Functions: Added
convert_to_camel_case/1andstruct_to_api_map/1utilities - Error Handling: Improved validation and error messages for generation config
- Documentation: Enhanced inline documentation for generation config handling
- Type Safety: Maintained strict type checking while expanding functionality
Performance
- Request Optimization: Reduced API request payload size by filtering nil values
- Processing Efficiency: Streamlined generation config building process
- Memory Usage: More efficient handling of large GenerationConfig structs
๐ Documentation
Updated Examples
- Enhanced examples to demonstrate new generation config capabilities
- Added response schema examples for structured output
- Updated documentation to reflect consistent behavior across modules
Migration Notes
For Existing Users
No breaking changes - all existing code continues to work. However, you can now use previously unsupported options:
# These options now work in all modules:
{:ok, response} = Gemini.generate("Explain AI", [
response_schema: %{"type" => "object", "properties" => %{"summary" => %{"type" => "string"}}},
response_mime_type: "application/json",
stop_sequences: ["END", "STOP"],
presence_penalty: 0.5,
frequency_penalty: 0.3
])
# GenerationConfig structs now work consistently:
config = %Gemini.Types.GenerationConfig{
temperature: 0.7,
response_schema: %{"type" => "object"},
max_output_tokens: 1000
}
{:ok, response} = Gemini.generate("Hello", generation_config: config)0.1.0 - 2025-07-20
๐ Major Release - Production Ready Multi-Auth Implementation
This is a significant milestone release featuring a complete unified implementation with concurrent multi-authentication support, enhanced examples, and production-ready telemetry system.
Added
๐ Multi-Authentication Coordinator
- Concurrent Auth Support: Enable simultaneous usage of Gemini API and Vertex AI authentication strategies
- Per-request Auth Selection: Choose authentication method on a per-request basis
- Authentication Strategy Routing: Automatic credential resolution and header generation
- Enhanced Configuration: Improved config system with better environment variable detection
๐ Unified Streaming Manager
- Multi-auth Streaming: Streaming support across both authentication strategies
- Advanced Stream Management: Preserve excellent SSE parsing while adding auth routing
- Stream Lifecycle Control: Complete stream state management (start, pause, resume, stop)
- Event Subscription System: Enhanced event handling with proper filtering
๐ฏ Comprehensive Examples Suite
telemetry_showcase.exs: Complete telemetry system demonstration with 7 event types- Enhanced
demo.exs: Updated with better chat sessions and API key masking - Enhanced
streaming_demo.exs: Real-time streaming with authentication detection - Enhanced
multi_auth_demo.exs: Concurrent authentication strategies with proper error handling - Enhanced
demo_unified.exs: Multi-auth architecture showcase - Enhanced
live_api_test.exs: Comprehensive API testing for both auth methods
๐ Advanced Telemetry System
- 7 Event Types: request start/stop/exception, stream start/chunk/stop/exception
- Helper Functions: Stream ID generation, content classification, metadata building
- Performance Monitoring: Live measurement and analysis capabilities
- Configuration Management: Telemetry enable/disable controls
๐ง API Enhancements
- Backward Compatibility Functions: Added missing functions (
model_exists?,stream_generate,start_link) - Response Normalization: Proper key conversion (
totalTokensโtotal_tokens,displayNameโdisplay_name) - Enhanced Error Handling: Better error formatting and recovery
- Content Extraction: Support for both struct and raw streaming data formats
Changed
๐๏ธ Architecture Improvements
- Type System: Resolved module conflicts and compilation warnings
- Configuration: Updated default model to
gemini-flash-lite-latest - Code Quality: Zero compilation warnings achieved across entire codebase
- Documentation: Updated model references and improved examples
๐ Example Organization
- Removed Legacy Examples: Cleaned up
simple_test.exs,simple_telemetry_test.exs,telemetry_demo.exs - Consistent Execution Pattern: All examples use
mix run examples/[name].exs - Better Error Handling: Graceful credential failure handling with informative messages
- Security: API key masking in output for better security
๐ Documentation Updates
- README Enhancement: Added comprehensive examples section with detailed descriptions
- Model Updates: Updated references to the latest Gemini models (Gemini 3 Pro Preview, 2.5 Flash/Flash-Lite) and new defaults
- Configuration Examples: Improved auth setup documentation
- Usage Patterns: Better code examples and patterns
Fixed
๐ Critical Fixes
- Type Module Conflicts: Resolved duplicate module definitions preventing compilation
- Chat Session Context: Fixed
send_messageto properly handle[Content.t()]arrays - Streaming Debug: Fixed undefined variables in demo scripts
- Response Parsing: Enhanced
build_generate_requestto support multiple content formats
๐ง Minor Improvements
- Function Coverage: Implemented all missing backward compatibility functions
- Token Counting: Fixed response key normalization for proper token count extraction
- Stream Management: Improved stream event collection and display
- Error Messages: Better error formatting and user-friendly messages
Technical Implementation
๐๏ธ Production Architecture
- 154 Tests Passing: Complete test coverage with zero failures
- Multi-auth Foundation: Robust concurrent authentication system
- Advanced Streaming: Real-time SSE with 30-117ms performance
- Type Safety: Complete
@specannotations and proper error handling - Zero Warnings: Clean compilation across entire codebase
๐ฆ Dependencies
- Maintained stable dependency versions for production reliability
- Enhanced configuration system compatibility
- Improved telemetry integration
Migration Guide
For Existing Users
# Old single-auth pattern (still works)
{:ok, response} = Gemini.generate("Hello")
# New multi-auth capability
{:ok, gemini_response} = Gemini.generate("Hello", auth: :gemini)
{:ok, vertex_response} = Gemini.generate("Hello", auth: :vertex_ai)Configuration Updates
# Enhanced configuration with auto-detection
config :gemini_ex,
default_model: "gemini-flash-lite-latest", # Updated default
timeout: 30_000,
telemetry_enabled: true # New telemetry controlsPerformance
- Real-time Streaming: 30-117ms chunk delivery performance
- Concurrent Authentication: Simultaneous multi-strategy usage
- Zero Compilation Warnings: Optimized build performance
- Memory Efficient: Enhanced streaming with proper backpressure
Security
- Credential Masking: API keys masked in all output for security
- Multi-auth Isolation: Secure credential separation between strategies
- Error Handling: No sensitive data in error messages
0.0.3 - 2025-07-07
Fixed
- API Response Parsing: Fixed issue where
usage_metadatawas always nil on successfulGemini.generate/2calls (#3)- The Gemini API returns camelCase keys like
"usageMetadata"which were not being converted to snake_case atoms - Updated
atomize_keyfunction in coordinator to properly convert camelCase strings to snake_case atoms - Now properly populates
usage_metadatawith token count information
- The Gemini API returns camelCase keys like
- Chat Sessions: Fixed conversation context not being maintained between messages
- The
send_messagefunction was only sending the new message, not the full conversation history - Now builds complete conversation history with proper role assignments before each API call
- Ensures AI maintains context and remembers information from previous messages
- The
0.0.2 - 2025-06-09
Fixed
- Documentation Rendering: Fixed mermaid diagram rendering errors on hex docs by removing emoji characters from diagram labels
- Package Links: Removed redundant "Documentation" link in hex package configuration, keeping only "Online documentation"
- Configuration References: Updated TELEMETRY_IMPLEMENTATION.md to reference
:gemini_exinstead of:geminifor correct application configuration
Changed
- Improved hex docs compatibility for better rendering of documentation diagrams
- Enhanced documentation consistency across all markdown files
0.0.1 - 2025-06-09
Added
Core Features
- Dual Authentication System: Support for both Gemini API keys and Vertex AI OAuth/Service Accounts
- Advanced Streaming: Production-grade Server-Sent Events (SSE) streaming with real-time processing
- Comprehensive API Coverage: Full support for Gemini API endpoints including content generation, model listing, and token counting
- Type Safety: Complete TypeScript-style type definitions with runtime validation
- Error Handling: Detailed error types with recovery suggestions and proper HTTP status code mapping
- Built-in Telemetry: Comprehensive observability with metrics and event tracking
- Chat Sessions: Multi-turn conversation management with state persistence
- Multimodal Support: Text, image, audio, and video content processing
Authentication
- Multi-strategy authentication coordinator with automatic strategy selection
- Environment variable and application configuration support
- Per-request authentication override capabilities
- Secure credential management with validation
- Support for Google Cloud Service Account JSON files
- OAuth2 Bearer token generation for Vertex AI
Streaming Architecture
- Unified streaming manager with state management
- Real-time SSE parsing with event dispatching
- Configurable buffer management and backpressure handling
- Stream lifecycle management (start, pause, resume, stop)
- Event subscription system with filtering capabilities
- Comprehensive error recovery and retry mechanisms
HTTP Client
- Dual HTTP client system (standard and streaming)
- Request/response interceptors for middleware support
- Automatic retry logic with exponential backoff
- Connection pooling and timeout management
- Request validation and response parsing
- Content-Type negotiation and encoding support
Type System
- Comprehensive type definitions for all API structures
- Runtime type validation with descriptive error messages
- Request and response schema validation
- Content type definitions for multimodal inputs
- Model capability and configuration types
- Error type hierarchy with actionable information
Configuration
- Hierarchical configuration system (runtime > environment > application)
- Environment variable detection and parsing
- Application configuration validation
- Default value management
- Configuration hot-reloading support
Utilities
- Content extraction helpers
- Response transformation utilities
- Validation helpers
- Debugging and logging utilities
- Performance monitoring tools
Technical Implementation
Architecture
- Layered architecture with clear separation of concerns
- Behavior-driven design for pluggable components
- GenServer-based application supervision tree
- Concurrent request processing with actor model
- Event-driven streaming with backpressure management
Dependencies
req~> 0.4.0 for HTTP client functionalityjason~> 1.4 for JSON encoding/decodingtyped_struct~> 0.3.0 for type definitionsjoken~> 2.6 for JWT handling in Vertex AI authenticationtelemetry~> 1.2 for observability and metrics
Development Tools
ex_docfor comprehensive documentation generationcredofor code quality analysisdialyxirfor static type analysis
Documentation
- Complete API reference documentation
- Architecture documentation with Mermaid diagrams
- Authentication system technical specification
- Getting started guide with examples
- Advanced usage patterns and best practices
- Error handling and troubleshooting guide
Security
- Secure credential storage and transmission
- Input validation and sanitization
- Rate limiting and throttling support
- SSL/TLS enforcement for all communications
- No sensitive data logging
Performance
- Optimized for high-throughput scenarios
- Memory-efficient streaming implementation
- Connection reuse and pooling
- Minimal latency overhead
- Concurrent request processing