ALLM.Providers.Gemini (allm v0.3.0)

Copy Markdown View Source

Google Gemini provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1 (bundled adapters).

Phase 16.1 ships the non-streaming ALLM.Adapter callback set against the Generative Language API at https://generativelanguage.googleapis.com/v1beta. Streaming (ALLM.StreamAdapter) lands in Phase 16.2; tools / vision / image-out in Phases 16.3/16.4/16.5.

This module implements:

  • generate/2 — fires POST /v1beta/models/{model}:generateContent via Req, wrapped in ALLM.Retry.run/3 with the default retry policy (Decision #16 — Gemini's 429 / 500 / 503 / 504 are already covered by spec §6.1's default retryable set; no Gemini-specific wrapper is needed).
  • prepare_request/2 — returns an unfired %Req.Request{} with the API key injected as x-goog-api-key (Decision #2).
  • translate_options/2 — identity (Decision #18). Gemini's camelCase rename and generationConfig nesting happens inside to_generation_config/1 at request-build time.

Single translator (Decision #1)

Gemini exposes one chat endpoint, generateContent, that covers both text and image generation — image generation is selected by toggling generationConfig.responseModalities. The request-builder (to_gemini_request_body/2) is therefore a single function shared across the chat adapter and (in Phase 16.5) the image adapter. This amortizes the PHASE_10 dual-translator drift class to zero.

Auth header (Decision #2/#3)

The API key flows on the x-goog-api-key request header, not the documented ?key=... query parameter. Both forms are equivalent server-side; the header form keeps the API key out of HTTP access logs and metrics. The same header is reused for the streaming endpoint (Decision #3).

Wire field map (per spec §35.7 + GEMINI_DESIGN.md)

ConcernGemini wire field
Endpoint hosthttps://generativelanguage.googleapis.com/v1beta
Method (chat non-streaming)POST /models/{model}:generateContent
Auth headerx-goog-api-key: $key
Rolesuser, model (:assistant → "model")
System prompttop-level systemInstruction.parts[].text
Generation paramsnested under generationConfig.{maxOutputTokens, temperature, topP, topK, stopSequences, responseMimeType, responseSchema}
finish_reasoncandidates[0].finishReason (UPPER_SNAKE_CASE; mapping table below)
Prompt-blocked pathpromptFeedback.blockReason (top-level, no candidates)
Usage locationusageMetadata.{promptTokenCount, candidatesTokenCount, totalTokenCount}
Error envelope{"error": {"code", "status", "message"}}

Finish-reason mapping (Decision #14)

Gemini's enum has 19 documented values. ALLM's Response.finish_reason is a closed 6-atom union; the raw string is preserved at Response.raw_finish_reason for non-canonical rows.

Gemini finishReasonALLM Response.finish_reason
STOP:stop
MAX_TOKENS:length
SAFETY:content_filter
RECITATION:content_filter
LANGUAGE:content_filter
BLOCKLIST:content_filter
PROHIBITED_CONTENT:content_filter
SPII:content_filter
IMAGE_SAFETY:content_filter
IMAGE_PROHIBITED_CONTENT:content_filter
IMAGE_RECITATION:content_filter
IMAGE_OTHER:other
NO_IMAGE:other
MALFORMED_FUNCTION_CALL:error
UNEXPECTED_TOOL_CALL:error
TOO_MANY_TOOL_CALLS:error
MISSING_THOUGHT_SIGNATURE:error
MALFORMED_RESPONSE:error
OTHER / FINISH_REASON_UNSPECIFIED / unknown:other

Empty-candidates branches (Decisions #9 + #10)

  • promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}}. The block reason is preserved at metadata.error.reason = "blocked:<BLOCK_REASON>".
  • empty candidates with no promptFeedback.blockReason{:error, %AdapterError{reason: :malformed_response}}.

Usage decoding (Decision #11)

usageMetadata.candidatesTokenCount is canonical; usageMetadata.responseTokenCount is read as a defensive fallback when candidatesTokenCount is absent. If both are missing, Usage.output_tokens is left at nil and a one-time Logger.warning/1 fires per call.

Error envelope mapping (Decision #15)

Maps Google's {error: {code, status, message}} envelope onto %AdapterError{reason: ...}:

HTTPGoogle statusAdapterError.reason
400INVALID_ARGUMENT (no marker):invalid_request
400INVALID_ARGUMENT (exceeds the maximum number of tokens substring):context_length_exceeded
401UNAUTHENTICATED:authentication_failed
403PERMISSION_DENIED:authentication_failed
404NOT_FOUND:invalid_request
429RESOURCE_EXHAUSTED:rate_limited
500INTERNAL:provider_unavailable
503UNAVAILABLE:provider_unavailable
504DEADLINE_EXCEEDED:provider_unavailable

Retry policy (Decision #16)

No Gemini-specific retry-policy wrapper. The default policy at lib/allm/retry.ex already retries HTTP 429, 500, 502, 503, 504, and :timeout / :network_error. Streaming never retries (spec §6.1).

Key resolution

Keys never appear on the engine. prepare_request/2 and generate/2 call ALLM.Keys.fetch!(:gemini, opts) at request-build time. The :gemini provider atom is not in ALLM.Keys's @env_var_table; the unknown-provider fallback at lib/allm/keys.ex:189-194 returns "GEMINI_API_KEY".

Summary

Functions

Execute a non-streaming generateContent request synchronously.

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per Decision #14.

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key> (Decision #2).

Open an SSE stream against streamGenerateContent?alt=sse.

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

Identity translator (Decision #18). Gemini accepts ALLM's canonical :max_tokens, :temperature, :top_p, etc. — the camelCase rename and generationConfig nesting happens in to_generation_config/1 at request-build time, not here.

Functions

generate(request, opts)

Execute a non-streaming generateContent request synchronously.

Wraps the HTTP call in ALLM.Retry.run/3 with the spec §6.1 default policy (Decision #16). Returns {:ok, %Response{}} on 2xx success or {:error, %AdapterError{}} on every failure shape.

Empty-candidates handling (Decisions #9 + #10)

  • promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}} (a successful HTTP response is a successful call from the adapter's perspective; the content filter is a finish reason).
  • Empty candidates with no promptFeedback.blockReason{:error, %AdapterError{reason: :malformed_response}}.

Error reasons (Decision #15)

HTTPAdapterError.reason
400 generic:invalid_request
400 ctx-window:context_length_exceeded
401 / 403:authentication_failed
404:invalid_request
429:rate_limited
500 / 503 / 504:provider_unavailable
network drop:network_error
malformed body:malformed_response

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...>   ALLM.Providers.Gemini.generate(req,
...>     retry: false,
...>     adapter_opts: [plug: fn conn ->
...>       conn
...>       |> Plug.Conn.put_resp_content_type("application/json")
...>       |> Plug.Conn.resp(401, ~s({"error":{"code":401,"status":"UNAUTHENTICATED","message":"bad"}}))
...>     end]
...>   )
iex> ALLM.Keys.delete(:gemini)
:ok

parse_finish_reason(other)

@spec parse_finish_reason(String.t() | nil) ::
  {ALLM.Response.finish_reason() | nil, String.t() | nil}

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per Decision #14.

STOP collapses to {:stop, nil} (the canonical "natural completion" row); every other row preserves the raw string at index 1 so callers can recover provider fidelity from Response.raw_finish_reason.

Examples

iex> ALLM.Providers.Gemini.parse_finish_reason("STOP")
{:stop, nil}

iex> ALLM.Providers.Gemini.parse_finish_reason("MAX_TOKENS")
{:length, "MAX_TOKENS"}

iex> ALLM.Providers.Gemini.parse_finish_reason("SAFETY")
{:content_filter, "SAFETY"}

iex> ALLM.Providers.Gemini.parse_finish_reason("OTHER")
{:other, "OTHER"}

iex> ALLM.Providers.Gemini.parse_finish_reason(nil)
{nil, nil}

prepare_request(request, opts)

@spec prepare_request(
  ALLM.Request.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key> (Decision #2).

Per ALLM.Keys.fetch!/2, this function raises %ALLM.Error.EngineError{reason: :missing_key} when no key resolver yields a value.

Honors opts[:request_timeout] (forwarded as Req's :receive_timeout) and opts[:adapter_opts][:endpoint] (URL host override, primarily for testing).

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gemini-2.5-flash")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.Gemini.prepare_request(req, [])
iex> Req.Request.get_header(http, "x-goog-api-key")
["AIza-doctest-prep"]
iex> ALLM.Keys.delete(:gemini)
:ok

stream(request, opts)

Open an SSE stream against streamGenerateContent?alt=sse.

Returns {:ok, enumerable} on success — the enumerable is lazy; the HTTP request fires on the first reduce. Returns {:error, %AdapterError{}} only for synchronous pre-flight failures (key-resolution failure raises %EngineError{} directly per the Keys.fetch!/2 contract; that is surfaced through the existing with-chain at the call site).

Per CLAUDE.md mid-stream-error invariant, HTTP-shaped errors observed AFTER the consumer starts reducing are folded into a terminating {:error, _} event — the call-site tuple stays {:ok, stream}. This includes 4xx status codes received before the first SSE event (the {:status, code} Finch frame folds via handle_finch_payload/2).

Decision references

  • Decision #1 — request body byte-equal to generate/2's. Only the URL path differs (:streamGenerateContent?alt=sse vs :generateContent).
  • Decision #3?alt=sse is the ONLY required query parameter; auth still flows via x-goog-api-key.
  • Decision #12usageMetadata may appear on intermediate chunks; the chunk-mapper emits {:raw_chunk, {:usage, _}} on every appearance and StreamCollector.apply_event/2 overwrites.
  • Decision #13 — stream terminates on Finch's :done payload, not a data: [DONE] lookahead. The synthetic :message_completed event is built from accumulated state.

Options

  • :stream_timeout (default 60_000 ms) — receive-loop after-clause between chunks.
  • :finch_module (default Finch) — test injection seam.
  • :finch_name (default ALLM.Finch).
  • :finch_stub_ref — opaque ref forwarded to the Finch shim (used only by ALLM.Test.FinchStub).
  • :adapter_opts[:endpoint] — endpoint override (testing).

to_gemini_request_body(request, opts)

@spec to_gemini_request_body(
  ALLM.Request.t(),
  keyword()
) :: map()

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

Performs system-message extraction (hoist into top-level systemInstruction), role mapping (:assistant → "model"), and generationConfig composition.

Phase 16.1 surface only — tools (16.3) and image-out (16.5) extend this builder via opts flags without changing the text-only path.

Examples

iex> req = ALLM.Request.new(
...>   [%ALLM.Message{role: :system, content: "Be concise."},
...>    %ALLM.Message{role: :user, content: "Hi"}],
...>   model: "gemini-2.5-flash", max_tokens: 256
...> )
iex> body = ALLM.Providers.Gemini.to_gemini_request_body(req, [])
iex> {body["systemInstruction"], length(body["contents"]), body["generationConfig"]["maxOutputTokens"]}
{%{"parts" => [%{"text" => "Be concise."}]}, 1, 256}

to_gemini_tool_config(name)

@spec to_gemini_tool_config(ALLM.Request.tool_choice() | {:tool, String.t()}) :: map()

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

ALLM canonicalGemini wire
:auto%{"mode" => "AUTO"}
:required%{"mode" => "ANY"}
:none%{"mode" => "NONE"}
{:tool, "name"}%{"mode" => "ANY", "allowedFunctionNames" => ["name"]}
"name" (string)shorthand for {:tool, "name"}

Map shapes (%{"mode" => "AUTO"}, etc.) are passed through verbatim so callers can hand-craft Gemini-specific extensions.

Examples

iex> ALLM.Providers.Gemini.to_gemini_tool_config(:auto)
%{"mode" => "AUTO"}

iex> ALLM.Providers.Gemini.to_gemini_tool_config({:tool, "set_color"})
%{"mode" => "ANY", "allowedFunctionNames" => ["set_color"]}

to_gemini_tools(tools)

@spec to_gemini_tools([ALLM.Tool.t()]) :: [map()]

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

Gemini's tools is an array of %{functionDeclarations: [...]} objects, not a flat array of declarations. Each declaration carries :name, :description, and :parameters (Gemini's name for the JSON-Schema field — distinct from OpenAI's parameters key on the tool's function sub-map and Anthropic's input_schema).

Examples

iex> tool = ALLM.Tool.new(name: "get_weather", description: "weather", schema: %{"type" => "object"})
iex> ALLM.Providers.Gemini.to_gemini_tools([tool])
[%{"name" => "get_weather", "description" => "weather", "parameters" => %{"type" => "object"}}]

translate_options(opts, request)

@spec translate_options(
  keyword(),
  ALLM.Request.t()
) :: keyword()

Identity translator (Decision #18). Gemini accepts ALLM's canonical :max_tokens, :temperature, :top_p, etc. — the camelCase rename and generationConfig nesting happens in to_generation_config/1 at request-build time, not here.

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> ALLM.Providers.Gemini.translate_options([max_tokens: 100, temperature: 0.7], req)
[max_tokens: 100, temperature: 0.7]