ALLM.Providers.Gemini (allm v0.3.0)

Google Gemini provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1 (bundled adapters).

Phase 16.1 ships the non-streaming ALLM.Adapter callback set against the Generative Language API at https://generativelanguage.googleapis.com/v1beta. Streaming (ALLM.StreamAdapter) lands in Phase 16.2; tools / vision / image-out in Phases 16.3/16.4/16.5.

This module implements:

generate/2 — fires POST /v1beta/models/{model}:generateContent via Req, wrapped in ALLM.Retry.run/3 with the default retry policy (Decision #16 — Gemini's 429 / 500 / 503 / 504 are already covered by spec §6.1's default retryable set; no Gemini-specific wrapper is needed).
prepare_request/2 — returns an unfired %Req.Request{} with the API key injected as x-goog-api-key (Decision #2).
translate_options/2 — identity (Decision #18). Gemini's camelCase rename and generationConfig nesting happens inside to_generation_config/1 at request-build time.

Single translator (Decision #1)

Gemini exposes one chat endpoint, generateContent, that covers both text and image generation — image generation is selected by toggling generationConfig.responseModalities. The request-builder (to_gemini_request_body/2) is therefore a single function shared across the chat adapter and (in Phase 16.5) the image adapter. This amortizes the PHASE_10 dual-translator drift class to zero.

Auth header (Decision #2/#3)

The API key flows on the x-goog-api-key request header, not the documented ?key=... query parameter. Both forms are equivalent server-side; the header form keeps the API key out of HTTP access logs and metrics. The same header is reused for the streaming endpoint (Decision #3).

Wire field map (per spec §35.7 + GEMINI_DESIGN.md)

Concern	Gemini wire field
Endpoint host	`https://generativelanguage.googleapis.com/v1beta`
Method (chat non-streaming)	`POST /models/{model}:generateContent`
Auth header	`x-goog-api-key: $key`
Roles	`user`, `model` (`:assistant → "model"`)
System prompt	top-level `systemInstruction.parts[].text`
Generation params	nested under `generationConfig.{maxOutputTokens, temperature, topP, topK, stopSequences, responseMimeType, responseSchema}`
`finish_reason`	`candidates[0].finishReason` (UPPER_SNAKE_CASE; mapping table below)
Prompt-blocked path	`promptFeedback.blockReason` (top-level, no candidates)
Usage location	`usageMetadata.{promptTokenCount, candidatesTokenCount, totalTokenCount}`
Error envelope	`{"error": {"code", "status", "message"}}`

Finish-reason mapping (Decision #14)

Gemini's enum has 19 documented values. ALLM's Response.finish_reason is a closed 6-atom union; the raw string is preserved at Response.raw_finish_reason for non-canonical rows.

Gemini `finishReason`	ALLM `Response.finish_reason`
`STOP`	`:stop`
`MAX_TOKENS`	`:length`
`SAFETY`	`:content_filter`
`RECITATION`	`:content_filter`
`LANGUAGE`	`:content_filter`
`BLOCKLIST`	`:content_filter`
`PROHIBITED_CONTENT`	`:content_filter`
`SPII`	`:content_filter`
`IMAGE_SAFETY`	`:content_filter`
`IMAGE_PROHIBITED_CONTENT`	`:content_filter`
`IMAGE_RECITATION`	`:content_filter`
`IMAGE_OTHER`	`:other`
`NO_IMAGE`	`:other`
`MALFORMED_FUNCTION_CALL`	`:error`
`UNEXPECTED_TOOL_CALL`	`:error`
`TOO_MANY_TOOL_CALLS`	`:error`
`MISSING_THOUGHT_SIGNATURE`	`:error`
`MALFORMED_RESPONSE`	`:error`
`OTHER` / `FINISH_REASON_UNSPECIFIED` / unknown	`:other`

Empty-candidates branches (Decisions #9 + #10)

promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}}. The block reason is preserved at metadata.error.reason = "blocked:<BLOCK_REASON>".
empty candidates with no promptFeedback.blockReason → {:error, %AdapterError{reason: :malformed_response}}.

Usage decoding (Decision #11)

usageMetadata.candidatesTokenCount is canonical; usageMetadata.responseTokenCount is read as a defensive fallback when candidatesTokenCount is absent. If both are missing, Usage.output_tokens is left at nil and a one-time Logger.warning/1 fires per call.

Error envelope mapping (Decision #15)

Maps Google's {error: {code, status, message}} envelope onto %AdapterError{reason: ...}:

HTTP	Google `status`	`AdapterError.reason`
400	`INVALID_ARGUMENT` (no marker)	`:invalid_request`
400	`INVALID_ARGUMENT` (`exceeds the maximum number of tokens` substring)	`:context_length_exceeded`
401	`UNAUTHENTICATED`	`:authentication_failed`
403	`PERMISSION_DENIED`	`:authentication_failed`
404	`NOT_FOUND`	`:invalid_request`
429	`RESOURCE_EXHAUSTED`	`:rate_limited`
500	`INTERNAL`	`:provider_unavailable`
503	`UNAVAILABLE`	`:provider_unavailable`
504	`DEADLINE_EXCEEDED`	`:provider_unavailable`

Retry policy (Decision #16)

No Gemini-specific retry-policy wrapper. The default policy at lib/allm/retry.ex already retries HTTP 429, 500, 502, 503, 504, and :timeout / :network_error. Streaming never retries (spec §6.1).

Key resolution

Keys never appear on the engine. prepare_request/2 and generate/2 call ALLM.Keys.fetch!(:gemini, opts) at request-build time. The :gemini provider atom is not in ALLM.Keys's @env_var_table; the unknown-provider fallback at lib/allm/keys.ex:189-194 returns "GEMINI_API_KEY".

Summary

Functions

generate(request, opts)

Execute a non-streaming generateContent request synchronously.

parse_finish_reason(other)

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per Decision #14.

prepare_request(request, opts)

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key> (Decision #2).

stream(request, opts)

Open an SSE stream against streamGenerateContent?alt=sse.

to_gemini_request_body(request, opts)

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

to_gemini_tool_config(name)

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

to_gemini_tools(tools)

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

translate_options(opts, request)

Identity translator (Decision #18). Gemini accepts ALLM's canonical :max_tokens, :temperature, :top_p, etc. — the camelCase rename and generationConfig nesting happens in to_generation_config/1 at request-build time, not here.

Functions

generate(request, opts)

@spec generate(
  ALLM.Request.t(),
  keyword()
) ::
  {:ok, ALLM.Response.t()}
  | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}

Execute a non-streaming generateContent request synchronously.

Wraps the HTTP call in ALLM.Retry.run/3 with the spec §6.1 default policy (Decision #16). Returns {:ok, %Response{}} on 2xx success or {:error, %AdapterError{}} on every failure shape.

Empty-candidates handling (Decisions #9 + #10)

promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}} (a successful HTTP response is a successful call from the adapter's perspective; the content filter is a finish reason).
Empty candidates with no promptFeedback.blockReason → {:error, %AdapterError{reason: :malformed_response}}.

Error reasons (Decision #15)

HTTP	`AdapterError.reason`
400 generic	`:invalid_request`
400 ctx-window	`:context_length_exceeded`
401 / 403	`:authentication_failed`
404	`:invalid_request`
429	`:rate_limited`
500 / 503 / 504	`:provider_unavailable`
network drop	`:network_error`
malformed body	`:malformed_response`

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...>   ALLM.Providers.Gemini.generate(req,
...>     retry: false,
...>     adapter_opts: [plug: fn conn ->
...>       conn
...>       |> Plug.Conn.put_resp_content_type("application/json")
...>       |> Plug.Conn.resp(401, ~s({"error":{"code":401,"status":"UNAUTHENTICATED","message":"bad"}}))
...>     end]
...>   )
iex> ALLM.Keys.delete(:gemini)
:ok

parse_finish_reason(other)

@spec parse_finish_reason(String.t() | nil) ::
  {ALLM.Response.finish_reason() | nil, String.t() | nil}

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per Decision #14.

STOP collapses to {:stop, nil} (the canonical "natural completion" row); every other row preserves the raw string at index 1 so callers can recover provider fidelity from Response.raw_finish_reason.

Examples

iex> ALLM.Providers.Gemini.parse_finish_reason("STOP")
{:stop, nil}

iex> ALLM.Providers.Gemini.parse_finish_reason("MAX_TOKENS")
{:length, "MAX_TOKENS"}

iex> ALLM.Providers.Gemini.parse_finish_reason("SAFETY")
{:content_filter, "SAFETY"}

iex> ALLM.Providers.Gemini.parse_finish_reason("OTHER")
{:other, "OTHER"}

iex> ALLM.Providers.Gemini.parse_finish_reason(nil)
{nil, nil}

prepare_request(request, opts)

@spec prepare_request(
  ALLM.Request.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key> (Decision #2).

Per ALLM.Keys.fetch!/2, this function raises %ALLM.Error.EngineError{reason: :missing_key} when no key resolver yields a value.

Honors opts[:request_timeout] (forwarded as Req's :receive_timeout) and opts[:adapter_opts][:endpoint] (URL host override, primarily for testing).

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gemini-2.5-flash")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.Gemini.prepare_request(req, [])
iex> Req.Request.get_header(http, "x-goog-api-key")
["AIza-doctest-prep"]
iex> ALLM.Keys.delete(:gemini)
:ok

stream(request, opts)

@spec stream(
  ALLM.Request.t(),
  keyword()
) ::
  {:ok, Enumerable.t()}
  | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}

Open an SSE stream against streamGenerateContent?alt=sse.

Returns {:ok, enumerable} on success — the enumerable is lazy; the HTTP request fires on the first reduce. Returns {:error, %AdapterError{}} only for synchronous pre-flight failures (key-resolution failure raises %EngineError{} directly per the Keys.fetch!/2 contract; that is surfaced through the existing with-chain at the call site).

Per CLAUDE.md mid-stream-error invariant, HTTP-shaped errors observed AFTER the consumer starts reducing are folded into a terminating {:error, _} event — the call-site tuple stays {:ok, stream}. This includes 4xx status codes received before the first SSE event (the {:status, code} Finch frame folds via handle_finch_payload/2).

Decision references

Decision #1 — request body byte-equal to generate/2's. Only the URL path differs (:streamGenerateContent?alt=sse vs :generateContent).
Decision #3 — ?alt=sse is the ONLY required query parameter; auth still flows via x-goog-api-key.
Decision #12 — usageMetadata may appear on intermediate chunks; the chunk-mapper emits {:raw_chunk, {:usage, _}} on every appearance and StreamCollector.apply_event/2 overwrites.
Decision #13 — stream terminates on Finch's :done payload, not a data: [DONE] lookahead. The synthetic :message_completed event is built from accumulated state.

Options

:stream_timeout (default 60_000 ms) — receive-loop after-clause between chunks.
:finch_module (default Finch) — test injection seam.
:finch_name (default ALLM.Finch).
:finch_stub_ref — opaque ref forwarded to the Finch shim (used only by ALLM.Test.FinchStub).
:adapter_opts[:endpoint] — endpoint override (testing).

to_gemini_request_body(request, opts)

@spec to_gemini_request_body(
  ALLM.Request.t(),
  keyword()
) :: map()

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

Performs system-message extraction (hoist into top-level systemInstruction), role mapping (:assistant → "model"), and generationConfig composition.

Phase 16.1 surface only — tools (16.3) and image-out (16.5) extend this builder via opts flags without changing the text-only path.

Examples

iex> req = ALLM.Request.new(
...>   [%ALLM.Message{role: :system, content: "Be concise."},
...>    %ALLM.Message{role: :user, content: "Hi"}],
...>   model: "gemini-2.5-flash", max_tokens: 256
...> )
iex> body = ALLM.Providers.Gemini.to_gemini_request_body(req, [])
iex> {body["systemInstruction"], length(body["contents"]), body["generationConfig"]["maxOutputTokens"]}
{%{"parts" => [%{"text" => "Be concise."}]}, 1, 256}

to_gemini_tool_config(name)

@spec to_gemini_tool_config(ALLM.Request.tool_choice() | {:tool, String.t()}) :: map()

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

ALLM canonical	Gemini wire
`:auto`	`%{"mode" => "AUTO"}`
`:required`	`%{"mode" => "ANY"}`
`:none`	`%{"mode" => "NONE"}`
`{:tool, "name"}`	`%{"mode" => "ANY", "allowedFunctionNames" => ["name"]}`
`"name"` (string)	shorthand for `{:tool, "name"}`

Map shapes (%{"mode" => "AUTO"}, etc.) are passed through verbatim so callers can hand-craft Gemini-specific extensions.

Examples

iex> ALLM.Providers.Gemini.to_gemini_tool_config(:auto)
%{"mode" => "AUTO"}

iex> ALLM.Providers.Gemini.to_gemini_tool_config({:tool, "set_color"})
%{"mode" => "ANY", "allowedFunctionNames" => ["set_color"]}

to_gemini_tools(tools)

@spec to_gemini_tools([ALLM.Tool.t()]) :: [map()]

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

Gemini's tools is an array of %{functionDeclarations: [...]} objects, not a flat array of declarations. Each declaration carries :name, :description, and :parameters (Gemini's name for the JSON-Schema field — distinct from OpenAI's parameters key on the tool's function sub-map and Anthropic's input_schema).

Examples

iex> tool = ALLM.Tool.new(name: "get_weather", description: "weather", schema: %{"type" => "object"})
iex> ALLM.Providers.Gemini.to_gemini_tools([tool])
[%{"name" => "get_weather", "description" => "weather", "parameters" => %{"type" => "object"}}]

translate_options(opts, request)

@spec translate_options(
  keyword(),
  ALLM.Request.t()
) :: keyword()

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> ALLM.Providers.Gemini.translate_options([max_tokens: 100, temperature: 0.7], req)
[max_tokens: 100, temperature: 0.7]