Google Gemini provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1 (bundled adapters).
Phase 16.1 ships the non-streaming ALLM.Adapter callback set against
the Generative Language API at
https://generativelanguage.googleapis.com/v1beta. Streaming
(ALLM.StreamAdapter) lands in Phase 16.2; tools / vision / image-out
in Phases 16.3/16.4/16.5.
This module implements:
generate/2— firesPOST /v1beta/models/{model}:generateContentviaReq, wrapped inALLM.Retry.run/3with the default retry policy (Decision #16 — Gemini's 429 / 500 / 503 / 504 are already covered by spec §6.1's default retryable set; no Gemini-specific wrapper is needed).prepare_request/2— returns an unfired%Req.Request{}with the API key injected asx-goog-api-key(Decision #2).translate_options/2— identity (Decision #18). Gemini's camelCase rename andgenerationConfignesting happens insideto_generation_config/1at request-build time.
Single translator (Decision #1)
Gemini exposes one chat endpoint, generateContent, that covers both
text and image generation — image generation is selected by toggling
generationConfig.responseModalities. The request-builder
(to_gemini_request_body/2) is therefore a single function shared
across the chat adapter and (in Phase 16.5) the image adapter. This
amortizes the PHASE_10 dual-translator drift class to zero.
Auth header (Decision #2/#3)
The API key flows on the x-goog-api-key request header, not the
documented ?key=... query parameter. Both forms are equivalent
server-side; the header form keeps the API key out of HTTP access
logs and metrics. The same header is reused for the streaming
endpoint (Decision #3).
Wire field map (per spec §35.7 + GEMINI_DESIGN.md)
| Concern | Gemini wire field |
|---|---|
| Endpoint host | https://generativelanguage.googleapis.com/v1beta |
| Method (chat non-streaming) | POST /models/{model}:generateContent |
| Auth header | x-goog-api-key: $key |
| Roles | user, model (:assistant → "model") |
| System prompt | top-level systemInstruction.parts[].text |
| Generation params | nested under generationConfig.{maxOutputTokens, temperature, topP, topK, stopSequences, responseMimeType, responseSchema} |
finish_reason | candidates[0].finishReason (UPPER_SNAKE_CASE; mapping table below) |
| Prompt-blocked path | promptFeedback.blockReason (top-level, no candidates) |
| Usage location | usageMetadata.{promptTokenCount, candidatesTokenCount, totalTokenCount} |
| Error envelope | {"error": {"code", "status", "message"}} |
Finish-reason mapping (Decision #14)
Gemini's enum has 19 documented values. ALLM's
Response.finish_reason is a closed 6-atom union; the raw string is
preserved at Response.raw_finish_reason for non-canonical rows.
Gemini finishReason | ALLM Response.finish_reason |
|---|---|
STOP | :stop |
MAX_TOKENS | :length |
SAFETY | :content_filter |
RECITATION | :content_filter |
LANGUAGE | :content_filter |
BLOCKLIST | :content_filter |
PROHIBITED_CONTENT | :content_filter |
SPII | :content_filter |
IMAGE_SAFETY | :content_filter |
IMAGE_PROHIBITED_CONTENT | :content_filter |
IMAGE_RECITATION | :content_filter |
IMAGE_OTHER | :other |
NO_IMAGE | :other |
MALFORMED_FUNCTION_CALL | :error |
UNEXPECTED_TOOL_CALL | :error |
TOO_MANY_TOOL_CALLS | :error |
MISSING_THOUGHT_SIGNATURE | :error |
MALFORMED_RESPONSE | :error |
OTHER / FINISH_REASON_UNSPECIFIED / unknown | :other |
Empty-candidates branches (Decisions #9 + #10)
promptFeedback.blockReasonwith empty candidates →{:ok, %Response{finish_reason: :content_filter, content: ""}}. The block reason is preserved atmetadata.error.reason = "blocked:<BLOCK_REASON>".- empty candidates with no
promptFeedback.blockReason→{:error, %AdapterError{reason: :malformed_response}}.
Usage decoding (Decision #11)
usageMetadata.candidatesTokenCount is canonical;
usageMetadata.responseTokenCount is read as a defensive fallback
when candidatesTokenCount is absent. If both are missing,
Usage.output_tokens is left at nil and a one-time
Logger.warning/1 fires per call.
Error envelope mapping (Decision #15)
Maps Google's {error: {code, status, message}} envelope onto
%AdapterError{reason: ...}:
| HTTP | Google status | AdapterError.reason |
|---|---|---|
| 400 | INVALID_ARGUMENT (no marker) | :invalid_request |
| 400 | INVALID_ARGUMENT (exceeds the maximum number of tokens substring) | :context_length_exceeded |
| 401 | UNAUTHENTICATED | :authentication_failed |
| 403 | PERMISSION_DENIED | :authentication_failed |
| 404 | NOT_FOUND | :invalid_request |
| 429 | RESOURCE_EXHAUSTED | :rate_limited |
| 500 | INTERNAL | :provider_unavailable |
| 503 | UNAVAILABLE | :provider_unavailable |
| 504 | DEADLINE_EXCEEDED | :provider_unavailable |
Retry policy (Decision #16)
No Gemini-specific retry-policy wrapper. The default policy at
lib/allm/retry.ex already retries HTTP 429, 500, 502, 503, 504,
and :timeout / :network_error. Streaming never retries
(spec §6.1).
Key resolution
Keys never appear on the engine. prepare_request/2 and generate/2
call ALLM.Keys.fetch!(:gemini, opts) at request-build time. The
:gemini provider atom is not in ALLM.Keys's @env_var_table;
the unknown-provider fallback at
lib/allm/keys.ex:189-194 returns "GEMINI_API_KEY".
Summary
Functions
Execute a non-streaming generateContent request synchronously.
Map a Gemini finishReason string to ALLM's closed
Response.finish_reason enum, returning {atom, raw_string_or_nil}
per Decision #14.
Build an unfired %Req.Request{} with the resolved API key injected
as x-goog-api-key: <key> (Decision #2).
Open an SSE stream against streamGenerateContent?alt=sse.
Compose the JSON request body for generateContent from a canonical
%Request{}. Pure function; no I/O.
Translate an ALLM canonical tool_choice to Gemini's
functionCallingConfig map.
Translate a list of canonical %ALLM.Tool{}s to Gemini's
functionDeclarations shape.
Identity translator (Decision #18). Gemini accepts ALLM's canonical
:max_tokens, :temperature, :top_p, etc. — the camelCase rename
and generationConfig nesting happens in to_generation_config/1 at
request-build time, not here.
Functions
@spec generate( ALLM.Request.t(), keyword() ) :: {:ok, ALLM.Response.t()} | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Execute a non-streaming generateContent request synchronously.
Wraps the HTTP call in ALLM.Retry.run/3 with the spec §6.1 default
policy (Decision #16). Returns {:ok, %Response{}} on 2xx success or
{:error, %AdapterError{}} on every failure shape.
Empty-candidates handling (Decisions #9 + #10)
promptFeedback.blockReasonwith empty candidates →{:ok, %Response{finish_reason: :content_filter, content: ""}}(a successful HTTP response is a successful call from the adapter's perspective; the content filter is a finish reason).- Empty candidates with no
promptFeedback.blockReason→{:error, %AdapterError{reason: :malformed_response}}.
Error reasons (Decision #15)
| HTTP | AdapterError.reason |
|---|---|
| 400 generic | :invalid_request |
| 400 ctx-window | :context_length_exceeded |
| 401 / 403 | :authentication_failed |
| 404 | :invalid_request |
| 429 | :rate_limited |
| 500 / 503 / 504 | :provider_unavailable |
| network drop | :network_error |
| malformed body | :malformed_response |
Examples
iex> ALLM.Keys.put(:gemini, "AIza-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...> ALLM.Providers.Gemini.generate(req,
...> retry: false,
...> adapter_opts: [plug: fn conn ->
...> conn
...> |> Plug.Conn.put_resp_content_type("application/json")
...> |> Plug.Conn.resp(401, ~s({"error":{"code":401,"status":"UNAUTHENTICATED","message":"bad"}}))
...> end]
...> )
iex> ALLM.Keys.delete(:gemini)
:ok
@spec parse_finish_reason(String.t() | nil) :: {ALLM.Response.finish_reason() | nil, String.t() | nil}
Map a Gemini finishReason string to ALLM's closed
Response.finish_reason enum, returning {atom, raw_string_or_nil}
per Decision #14.
STOP collapses to {:stop, nil} (the canonical "natural completion"
row); every other row preserves the raw string at index 1 so callers
can recover provider fidelity from Response.raw_finish_reason.
Examples
iex> ALLM.Providers.Gemini.parse_finish_reason("STOP")
{:stop, nil}
iex> ALLM.Providers.Gemini.parse_finish_reason("MAX_TOKENS")
{:length, "MAX_TOKENS"}
iex> ALLM.Providers.Gemini.parse_finish_reason("SAFETY")
{:content_filter, "SAFETY"}
iex> ALLM.Providers.Gemini.parse_finish_reason("OTHER")
{:other, "OTHER"}
iex> ALLM.Providers.Gemini.parse_finish_reason(nil)
{nil, nil}
@spec prepare_request( ALLM.Request.t(), keyword() ) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}
Build an unfired %Req.Request{} with the resolved API key injected
as x-goog-api-key: <key> (Decision #2).
Per ALLM.Keys.fetch!/2, this function raises
%ALLM.Error.EngineError{reason: :missing_key} when no key resolver
yields a value.
Honors opts[:request_timeout] (forwarded as Req's
:receive_timeout) and opts[:adapter_opts][:endpoint] (URL host
override, primarily for testing).
Examples
iex> ALLM.Keys.put(:gemini, "AIza-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gemini-2.5-flash")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.Gemini.prepare_request(req, [])
iex> Req.Request.get_header(http, "x-goog-api-key")
["AIza-doctest-prep"]
iex> ALLM.Keys.delete(:gemini)
:ok
@spec stream( ALLM.Request.t(), keyword() ) :: {:ok, Enumerable.t()} | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Open an SSE stream against streamGenerateContent?alt=sse.
Returns {:ok, enumerable} on success — the enumerable is lazy; the
HTTP request fires on the first reduce. Returns {:error, %AdapterError{}}
only for synchronous pre-flight failures (key-resolution failure raises
%EngineError{} directly per the Keys.fetch!/2 contract; that is
surfaced through the existing with-chain at the call site).
Per CLAUDE.md mid-stream-error invariant, HTTP-shaped errors observed
AFTER the consumer starts reducing are folded into a terminating
{:error, _} event — the call-site tuple stays {:ok, stream}. This
includes 4xx status codes received before the first SSE event (the
{:status, code} Finch frame folds via handle_finch_payload/2).
Decision references
- Decision #1 — request body byte-equal to
generate/2's. Only the URL path differs (:streamGenerateContent?alt=ssevs:generateContent). - Decision #3 —
?alt=sseis the ONLY required query parameter; auth still flows viax-goog-api-key. - Decision #12 —
usageMetadatamay appear on intermediate chunks; the chunk-mapper emits{:raw_chunk, {:usage, _}}on every appearance andStreamCollector.apply_event/2overwrites. - Decision #13 — stream terminates on Finch's
:donepayload, not adata: [DONE]lookahead. The synthetic:message_completedevent is built from accumulated state.
Options
:stream_timeout(default 60_000 ms) — receive-loop after-clause between chunks.:finch_module(defaultFinch) — test injection seam.:finch_name(defaultALLM.Finch).:finch_stub_ref— opaque ref forwarded to the Finch shim (used only byALLM.Test.FinchStub).:adapter_opts[:endpoint]— endpoint override (testing).
@spec to_gemini_request_body( ALLM.Request.t(), keyword() ) :: map()
Compose the JSON request body for generateContent from a canonical
%Request{}. Pure function; no I/O.
Performs system-message extraction (hoist into top-level
systemInstruction), role mapping (:assistant → "model"), and
generationConfig composition.
Phase 16.1 surface only — tools (16.3) and image-out (16.5) extend
this builder via opts flags without changing the text-only path.
Examples
iex> req = ALLM.Request.new(
...> [%ALLM.Message{role: :system, content: "Be concise."},
...> %ALLM.Message{role: :user, content: "Hi"}],
...> model: "gemini-2.5-flash", max_tokens: 256
...> )
iex> body = ALLM.Providers.Gemini.to_gemini_request_body(req, [])
iex> {body["systemInstruction"], length(body["contents"]), body["generationConfig"]["maxOutputTokens"]}
{%{"parts" => [%{"text" => "Be concise."}]}, 1, 256}
@spec to_gemini_tool_config(ALLM.Request.tool_choice() | {:tool, String.t()}) :: map()
Translate an ALLM canonical tool_choice to Gemini's
functionCallingConfig map.
| ALLM canonical | Gemini wire |
|---|---|
:auto | %{"mode" => "AUTO"} |
:required | %{"mode" => "ANY"} |
:none | %{"mode" => "NONE"} |
{:tool, "name"} | %{"mode" => "ANY", "allowedFunctionNames" => ["name"]} |
"name" (string) | shorthand for {:tool, "name"} |
Map shapes (%{"mode" => "AUTO"}, etc.) are passed through verbatim
so callers can hand-craft Gemini-specific extensions.
Examples
iex> ALLM.Providers.Gemini.to_gemini_tool_config(:auto)
%{"mode" => "AUTO"}
iex> ALLM.Providers.Gemini.to_gemini_tool_config({:tool, "set_color"})
%{"mode" => "ANY", "allowedFunctionNames" => ["set_color"]}
@spec to_gemini_tools([ALLM.Tool.t()]) :: [map()]
Translate a list of canonical %ALLM.Tool{}s to Gemini's
functionDeclarations shape.
Gemini's tools is an array of %{functionDeclarations: [...]}
objects, not a flat array of declarations. Each declaration carries
:name, :description, and :parameters (Gemini's name for the
JSON-Schema field — distinct from OpenAI's parameters key on the
tool's function sub-map and Anthropic's input_schema).
Examples
iex> tool = ALLM.Tool.new(name: "get_weather", description: "weather", schema: %{"type" => "object"})
iex> ALLM.Providers.Gemini.to_gemini_tools([tool])
[%{"name" => "get_weather", "description" => "weather", "parameters" => %{"type" => "object"}}]
@spec translate_options( keyword(), ALLM.Request.t() ) :: keyword()
Identity translator (Decision #18). Gemini accepts ALLM's canonical
:max_tokens, :temperature, :top_p, etc. — the camelCase rename
and generationConfig nesting happens in to_generation_config/1 at
request-build time, not here.
Examples
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> ALLM.Providers.Gemini.translate_options([max_tokens: 100, temperature: 0.7], req)
[max_tokens: 100, temperature: 0.7]