ReqLLM.Providers.Google.CachedContent (ReqLLM v1.3.0)
View SourceShared functionality for Google's Context Caching API.
Both Google AI Studio and Vertex AI support explicit context caching through the CachedContent API for Gemini models only. Claude models on Vertex AI do not support context caching. This module provides shared logic for creating and managing cached content.
Overview
Context caching allows you to cache large amounts of content (system instructions, documents, videos, etc.) and reference them in multiple requests, reducing costs and latency. This feature is available for Gemini 2.0 and 2.5 models.
Minimum Requirements
- Gemini 2.5 Flash: 1,024 tokens minimum
- Gemini 2.5 Pro: 4,096 tokens minimum
Cost Savings
- Gemini 2.5: 90% discount on cached tokens
- Gemini 2.0: 75% discount on cached tokens
- Storage costs apply based on TTL
Complete Workflow Example
# Step 1: Create cached content with a large document
large_document = File.read!("large_document.txt")
{:ok, cache} = ReqLLM.Providers.Google.CachedContent.create(
provider: :google,
model: "gemini-2.5-flash",
api_key: System.get_env("GOOGLE_API_KEY"),
contents: [
%{role: "user", parts: [%{text: large_document}]}
],
system_instruction: "You are a helpful assistant that answers questions about the provided document.",
ttl: "3600s",
display_name: "Document Analysis Cache"
)
# Step 2: Use the cache in multiple requests (90% discount on cached tokens!)
{:ok, response1} = ReqLLM.generate_text(
"google:gemini-2.5-flash",
"What is the main topic of the document?",
provider_options: [cached_content: cache.name]
)
{:ok, response2} = ReqLLM.generate_text(
"google:gemini-2.5-flash",
"Summarize the key points.",
provider_options: [cached_content: cache.name]
)
# Step 3: Check token usage (note the cached_tokens field)
IO.inspect(response1.usage)
# %{input_tokens: 50, cached_tokens: 10000, output_tokens: 100, ...}
# Step 4: Extend cache lifetime if needed
{:ok, updated_cache} = ReqLLM.Providers.Google.CachedContent.update(
provider: :google,
name: cache.name,
api_key: System.get_env("GOOGLE_API_KEY"),
ttl: "7200s"
)
# Step 5: Clean up when done
:ok = ReqLLM.Providers.Google.CachedContent.delete(
provider: :google,
name: cache.name,
api_key: System.get_env("GOOGLE_API_KEY")
)Vertex AI Example
# Vertex AI uses full resource paths for cache names
{:ok, cache} = ReqLLM.Providers.Google.CachedContent.create(
provider: :google_vertex,
model: "gemini-2.5-flash",
service_account_json: System.get_env("GOOGLE_APPLICATION_CREDENTIALS"),
project_id: "my-project",
region: "us-central1",
contents: [%{role: "user", parts: [%{text: large_document}]}],
system_instruction: "You are a helpful assistant.",
ttl: "3600s"
)
# Use in requests
{:ok, response} = ReqLLM.generate_text(
"google-vertex:gemini-2.5-flash",
"Question about the document?",
provider_options: [
cached_content: cache.name,
service_account_json: System.get_env("GOOGLE_APPLICATION_CREDENTIALS"),
project_id: "my-project"
]
)
Summary
Functions
Creates a new cached content resource.
Deletes a cached content resource.
Gets details about a specific cached content resource.
Lists all cached content resources.
Updates the TTL of an existing cached content resource.
Functions
Creates a new cached content resource.
Options
:provider- Either:google(AI Studio) or:google_vertex(Vertex AI):model- Model identifier (e.g., "gemini-2.5-flash"):api_key- API key (for Google AI Studio):service_account_json- Service account JSON path (for Vertex AI):project_id- GCP project ID (for Vertex AI):region- GCP region (for Vertex AI, defaults to "us-central1"):contents- List of content to cache (messages format):system_instruction- Optional system instruction to cache:tools- Optional tools to cache:tool_config- Optional tool configuration:ttl- Time-to-live duration (e.g., "3600s", defaults to "3600s"):display_name- Optional display name for the cache
Returns
{:ok, cache_info} where cache_info contains:
:name- The cache resource name/ID to use in requests:create_time- When the cache was created:update_time- When the cache was last updated:expire_time- When the cache will expire:usage_metadata- Token counts for the cached content
Examples
# Google AI Studio
{:ok, cache} = create(
provider: :google,
model: "gemini-2.5-flash",
api_key: "your-api-key",
contents: [%{role: "user", parts: [%{text: "Content to cache"}]}],
ttl: "3600s"
)
# Vertex AI
{:ok, cache} = create(
provider: :google_vertex,
model: "gemini-2.5-flash",
service_account_json: "/path/to/service-account.json",
project_id: "your-project",
region: "us-central1",
contents: [%{role: "user", parts: [%{text: "Content to cache"}]}],
ttl: "3600s"
)
Deletes a cached content resource.
Options
:provider- Either:googleor:google_vertex:name- The cache resource name/ID:api_key- API key (for Google AI Studio):service_account_json- Service account JSON path (for Vertex AI):project_id- GCP project ID (for Vertex AI):region- GCP region (for Vertex AI)
Gets details about a specific cached content resource.
Options
:provider- Either:googleor:google_vertex:name- The cache resource name/ID:api_key- API key (for Google AI Studio):service_account_json- Service account JSON path (for Vertex AI):project_id- GCP project ID (for Vertex AI):region- GCP region (for Vertex AI)
Lists all cached content resources.
Options
:provider- Either:googleor:google_vertex:api_key- API key (for Google AI Studio):service_account_json- Service account JSON path (for Vertex AI):project_id- GCP project ID (for Vertex AI):region- GCP region (for Vertex AI):page_size- Number of results per page (optional):page_token- Token for pagination (optional)
Updates the TTL of an existing cached content resource.
Options
:provider- Either:googleor:google_vertex:name- The cache resource name/ID:ttl- New time-to-live duration (e.g., "7200s"):api_key- API key (for Google AI Studio):service_account_json- Service account JSON path (for Vertex AI):project_id- GCP project ID (for Vertex AI):region- GCP region (for Vertex AI)