ReqLLM.Providers.Google.CachedContent (ReqLLM v1.3.0)

Shared functionality for Google's Context Caching API.

Both Google AI Studio and Vertex AI support explicit context caching through the CachedContent API for Gemini models only. Claude models on Vertex AI do not support context caching. This module provides shared logic for creating and managing cached content.

Overview

Context caching allows you to cache large amounts of content (system instructions, documents, videos, etc.) and reference them in multiple requests, reducing costs and latency. This feature is available for Gemini 2.0 and 2.5 models.

Minimum Requirements

Gemini 2.5 Flash: 1,024 tokens minimum
Gemini 2.5 Pro: 4,096 tokens minimum

Cost Savings

Gemini 2.5: 90% discount on cached tokens
Gemini 2.0: 75% discount on cached tokens
Storage costs apply based on TTL

Complete Workflow Example

# Step 1: Create cached content with a large document
large_document = File.read!("large_document.txt")

{:ok, cache} = ReqLLM.Providers.Google.CachedContent.create(
  provider: :google,
  model: "gemini-2.5-flash",
  api_key: System.get_env("GOOGLE_API_KEY"),
  contents: [
    %{role: "user", parts: [%{text: large_document}]}
  ],
  system_instruction: "You are a helpful assistant that answers questions about the provided document.",
  ttl: "3600s",
  display_name: "Document Analysis Cache"
)

# Step 2: Use the cache in multiple requests (90% discount on cached tokens!)
{:ok, response1} = ReqLLM.generate_text(
  "google:gemini-2.5-flash",
  "What is the main topic of the document?",
  provider_options: [cached_content: cache.name]
)

{:ok, response2} = ReqLLM.generate_text(
  "google:gemini-2.5-flash",
  "Summarize the key points.",
  provider_options: [cached_content: cache.name]
)

# Step 3: Check token usage (note the cached_tokens field)
IO.inspect(response1.usage)
# %{input_tokens: 50, cached_tokens: 10000, output_tokens: 100, ...}

# Step 4: Extend cache lifetime if needed
{:ok, updated_cache} = ReqLLM.Providers.Google.CachedContent.update(
  provider: :google,
  name: cache.name,
  api_key: System.get_env("GOOGLE_API_KEY"),
  ttl: "7200s"
)

# Step 5: Clean up when done
:ok = ReqLLM.Providers.Google.CachedContent.delete(
  provider: :google,
  name: cache.name,
  api_key: System.get_env("GOOGLE_API_KEY")
)

Vertex AI Example

# Vertex AI uses full resource paths for cache names
{:ok, cache} = ReqLLM.Providers.Google.CachedContent.create(
  provider: :google_vertex,
  model: "gemini-2.5-flash",
  service_account_json: System.get_env("GOOGLE_APPLICATION_CREDENTIALS"),
  project_id: "my-project",
  region: "us-central1",
  contents: [%{role: "user", parts: [%{text: large_document}]}],
  system_instruction: "You are a helpful assistant.",
  ttl: "3600s"
)

# Use in requests
{:ok, response} = ReqLLM.generate_text(
  "google-vertex:gemini-2.5-flash",
  "Question about the document?",
  provider_options: [
    cached_content: cache.name,
    service_account_json: System.get_env("GOOGLE_APPLICATION_CREDENTIALS"),
    project_id: "my-project"
  ]
)

Summary

Functions

create(opts)

Creates a new cached content resource.

delete(opts)

Deletes a cached content resource.

get(opts)

Gets details about a specific cached content resource.

list(opts)

Lists all cached content resources.

update(opts)

Updates the TTL of an existing cached content resource.

Functions

create(opts)

Creates a new cached content resource.

Options

:provider - Either :google (AI Studio) or :google_vertex (Vertex AI)
:model - Model identifier (e.g., "gemini-2.5-flash")
:api_key - API key (for Google AI Studio)
:service_account_json - Service account JSON path (for Vertex AI)
:project_id - GCP project ID (for Vertex AI)
:region - GCP region (for Vertex AI, defaults to "us-central1")
:contents - List of content to cache (messages format)
:system_instruction - Optional system instruction to cache
:tools - Optional tools to cache
:tool_config - Optional tool configuration
:ttl - Time-to-live duration (e.g., "3600s", defaults to "3600s")
:display_name - Optional display name for the cache

Returns

{:ok, cache_info} where cache_info contains:

:name - The cache resource name/ID to use in requests
:create_time - When the cache was created
:update_time - When the cache was last updated
:expire_time - When the cache will expire
:usage_metadata - Token counts for the cached content

Examples

# Google AI Studio
{:ok, cache} = create(
  provider: :google,
  model: "gemini-2.5-flash",
  api_key: "your-api-key",
  contents: [%{role: "user", parts: [%{text: "Content to cache"}]}],
  ttl: "3600s"
)

# Vertex AI
{:ok, cache} = create(
  provider: :google_vertex,
  model: "gemini-2.5-flash",
  service_account_json: "/path/to/service-account.json",
  project_id: "your-project",
  region: "us-central1",
  contents: [%{role: "user", parts: [%{text: "Content to cache"}]}],
  ttl: "3600s"
)

delete(opts)

Deletes a cached content resource.

Options

:provider - Either :google or :google_vertex
:name - The cache resource name/ID
:api_key - API key (for Google AI Studio)
:service_account_json - Service account JSON path (for Vertex AI)
:project_id - GCP project ID (for Vertex AI)
:region - GCP region (for Vertex AI)

get(opts)

Gets details about a specific cached content resource.

Options

:provider - Either :google or :google_vertex
:name - The cache resource name/ID
:api_key - API key (for Google AI Studio)
:service_account_json - Service account JSON path (for Vertex AI)
:project_id - GCP project ID (for Vertex AI)
:region - GCP region (for Vertex AI)

list(opts)

Lists all cached content resources.

Options

:provider - Either :google or :google_vertex
:api_key - API key (for Google AI Studio)
:service_account_json - Service account JSON path (for Vertex AI)
:project_id - GCP project ID (for Vertex AI)
:region - GCP region (for Vertex AI)
:page_size - Number of results per page (optional)
:page_token - Token for pagination (optional)

update(opts)

Updates the TTL of an existing cached content resource.

Options

:provider - Either :google or :google_vertex
:name - The cache resource name/ID
:ttl - New time-to-live duration (e.g., "7200s")
:api_key - API key (for Google AI Studio)
:service_account_json - Service account JSON path (for Vertex AI)
:project_id - GCP project ID (for Vertex AI)
:region - GCP region (for Vertex AI)