ALLM.Providers.Gemini.Images (allm v0.3.0)

Copy Markdown View Source

Google Gemini native image-out adapter — implements ALLM.ImageAdapter against generateContent with responseModalities: ["TEXT", "IMAGE"] on the Gemini-native image preview models (gemini-3.1-flash-image-preview / "Nano Banana 2", gemini-3-pro-image-preview / "Nano Banana Pro"). See spec §35.3, §35.7 and steering/GEMINI_DESIGN.md Phase 16.5.

Layer B — runtime. Consumed through the ALLM.generate_image/3 façade. Keys resolve via ALLM.Keys.fetch!(:gemini, opts) at request-build time per spec §6.4 — no key ever lives on the engine.

Single translator (Decision #7)

Image generation is generateContent with responseModalities toggled to ["TEXT", "IMAGE"]. The request body is built by ALLM.Providers.Gemini.to_gemini_request_body/2 (the same translator the chat adapter uses). The image adapter then overrides generationConfig.responseModalities and adds generationConfig.imageConfig.aspectRatio from the Decision #19 size-mapping table. The :edit operation reuses Phase 16.4's part_to_block/1 for source-image translation by synthesizing a user-role message with [%TextPart{}, %ImagePart{}, ...] content.

Aspect-ratio mapping (Decision #19)

ALLM ImageRequest.sizeGemini imageConfig.aspectRatio
"1024x1024", "512x512", "256x256", any square"1:1"
"1792x1024", any 16:9"16:9"
"1024x1792", any 9:16"9:16"
"1024x768", any 4:3"4:3"
"768x1024", any 3:4"3:4"
nilomit imageConfig (Gemini default)
anything else{:error, %ImageAdapterError{reason: :invalid_request}}

Pixel sizing (imageSize: "1K"|"2K"|"4K") is not exposed in v0.2's ImageRequest.size field; deferred. Aspect-ratio is the only knob.

Operation gate (Decision #6)

supported_operations/0 returns [:generate, :edit]. :variation is rejected with :unsupported_operation BEFORE any HTTP I/O per ImageAdapter invariant 4.

Test-injection escape hatch

opts[:adapter_opts][:image_script], when present, delegates to ALLM.Providers.FakeImages.generate/2 BEFORE any pre-flight gate runs. Mirrors the OpenAI.Images precedent at lib/allm/providers/openai/images.ex:251 (Phase 14.3 Decision #20).

Shared response decoder (Cross-function invariant)

Response bodies are decoded via ALLM.Providers.Gemini.Decode.candidate_parts/1 — the same helper Gemini.generate/2 calls (see lib/allm/providers/gemini.ex:991 post-Phase-16.5 refactor). The image adapter consumes the image_parts element of the returned tuple while the chat adapter consumes text + tool_calls; both walk the parts list once. Per steering/GEMINI_DESIGN.md cross-function invariants lines 217-219.

Summary

Functions

Return the Gemini endpoint path (relative to the API base URL) for the image-generation operation.

Execute an image-generation or edit request synchronously.

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Resolve an %Image{} source to raw bytes. Mirrors the OpenAI seam at lib/allm/providers/openai/images.ex:858.

Return the closed list of operations Gemini's image adapter supports.

Map ImageRequest.size to Gemini's imageConfig.aspectRatio per Decision #19. Returns the raw aspect-ratio string, :omit for nil, or {:error, :invalid_size} for an unmappable size.

Build the JSON request body for an image request.

Functions

endpoint_for(model)

(since 0.3.0)
@spec endpoint_for(String.t()) :: String.t()

Return the Gemini endpoint path (relative to the API base URL) for the image-generation operation.

Both :generate and :edit route through generateContent (the request body shape differs, the URL path does not). :variation is rejected pre-flight by gate_operation/2.

Examples

iex> ALLM.Providers.Gemini.Images.endpoint_for("gemini-3.1-flash-image-preview")
"/models/gemini-3.1-flash-image-preview:generateContent"

generate(request, opts)

@spec generate(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, ALLM.ImageResponse.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Execute an image-generation or edit request synchronously.

Pre-flight gates (per ImageAdapter invariant 4)

Before any HTTP I/O, generate/2 checks (in order):

  1. Test-injection escape hatch. When opts[:adapter_opts][:image_script] is non-nil, the call delegates to ALLM.Providers.FakeImages.generate/2.
  2. Operation gate. request.operation in supported_operations(). Failure → :unsupported_operation with metadata: %{operation: op}.
  3. Aspect-ratio gate. request.size, when non-nil, must map to one of "1:1" | "16:9" | "9:16" | "4:3" | "3:4". Failure → :invalid_request.

Key resolution (ALLM.Keys.fetch!/2) runs AFTER the gates — a request rejected pre-flight does not require a valid key.

Request-id / metadata round-trip (invariants 5 + 6)

opts[:request_id] is reflected onto response.request_id. request.metadata round-trips onto response.metadata unchanged.

prepare_request(request, opts)

@spec prepare_request(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Same gate ordering as generate/2. Returns {:error, %ImageAdapterError{}} for any pre-flight failure.

resolve_image_bytes(image, opts)

(since 0.3.0)
@spec resolve_image_bytes(
  ALLM.Image.t(),
  keyword()
) :: {:ok, binary(), String.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Resolve an %Image{} source to raw bytes. Mirrors the OpenAI seam at lib/allm/providers/openai/images.ex:858.

For Gemini, this helper exists for parity with the OpenAI image-adapter testing surface. The actual :edit request build delegates source translation to Gemini.part_to_block/1 (Phase 16.4) via the chat translator, which handles :binary, :base64, and :file sources; :url is rejected by Gemini.reject_unsupported_image_sources/1.

supported_operations()

@spec supported_operations() :: [:generate | :edit]

Return the closed list of operations Gemini's image adapter supports.

Per Decision #6 — [:generate, :edit]. :variation is not supported by the Gemini-native image models and is rejected pre-flight.

Examples

iex> ALLM.Providers.Gemini.Images.supported_operations()
[:generate, :edit]

to_aspect_ratio(s)

(since 0.3.0)
@spec to_aspect_ratio(ALLM.ImageRequest.size() | nil) ::
  {:ok, String.t()} | :omit | {:error, :invalid_size}

Map ImageRequest.size to Gemini's imageConfig.aspectRatio per Decision #19. Returns the raw aspect-ratio string, :omit for nil, or {:error, :invalid_size} for an unmappable size.

Square sizes ("NxN" or {n, n}) collapse to "1:1". Non-square sizes use exact ratio comparison rather than substring matching so "768x1024" (3:4) and "1024x1792" (~9:16) are disambiguated.

Examples

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio("1024x1024")
{:ok, "1:1"}

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio({1792, 1024})
{:ok, "16:9"}

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio(nil)
:omit

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio("999x111")
{:error, :invalid_size}

to_image_request_body(request, opts)

(since 0.3.0)
@spec to_image_request_body(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, map()} | {:error, ALLM.Error.ImageAdapterError.t()}

Build the JSON request body for an image request.

Synthesizes a chat-equivalent %Request{} (single user message whose content is the prompt for :generate, or [%TextPart{}, %ImagePart{}, ...] for :edit) and delegates to Gemini.to_gemini_request_body/2 per Decision #7. Then overrides generationConfig.responseModalities = ["TEXT", "IMAGE"] and (when the size maps to a known aspect ratio) adds generationConfig.imageConfig.aspectRatio. :n > 1 adds generationConfig.candidateCount: n.

Returns {:error, %ImageAdapterError{reason: :invalid_request}} for unmappable sizes per Decision #19's closed table.