ALLM.Providers.OpenAI.Images (allm v0.3.0)

Copy Markdown View Source

OpenAI Images provider adapter — implements ALLM.ImageAdapter against OpenAI's /v1/images/generations, /v1/images/edits, and /v1/images/variations endpoints. See spec §35.7.

Layer B — runtime. Constructed via ALLM.Engine.new(image_adapter: ALLM.Providers.OpenAI.Images, model: "dall-e-2") and consumed through the ALLM.generate_image/3 · edit_image/4 · image_variations/3 façade (Phase 14.2). Keys resolve via ALLM.Keys.fetch!(:openai, opts) at request-build time per spec §6.4 — no key ever lives on the engine.

Status

The JSON :generate HTTP path is wired for dall-e-2, dall-e-3, and gpt-image-1. The gpt-image-1 path applies forced-base64 normalization (gpt-image-1 ignores response_format at the wire), token-based usage (input_tokens / output_tokens), and output_format:mime_type mapping per Decision #19. The multipart :edit HTTP path is wired for dall-e-2 and gpt-image-1, including URL-source eager-download per Decision #8. The :variation path is wired for dall-e-2 via the same multipart machinery as :edit — variation drops prompt and mask fields and otherwise mirrors :edit's wire shape.

Model × Operation matrix

Model:generate:edit:variationWire formatUsage shape
dall-e-2yesyesyesurl or b64_json per callerimages = length(data)
dall-e-3yesnonourl or b64_json per callerimages = length(data)
gpt-image-1yesyesnob64_json ALWAYS (forced)images + input_tokens + output_tokens

Cells marked "no" produce {:error, %ImageAdapterError{reason: :unsupported_operation, metadata: %{operation: op, model: model}}} BEFORE any HTTP I/O. Unknown models (any string not in the matrix) fall through to the provider — see Decision #3 of steering/PHASE_15_image_layer_6.md.

gpt-image-1 specifics

  • Body fields. When request.model == "gpt-image-1", to_json_body/2 OMITS response_format, includes quality / background per the wire-field map, and includes output_format from request.options[:output_format] ("png" | "jpeg" | "webp"). When :output_format is absent the adapter OMITS the field and the OpenAI API applies its server-side default of "png". Per CLAUDE.md "Adapters MUST document any default they inject for a Layer-A nil field that the wire requires" — gpt-image-1's output_format does NOT need an adapter-side default because the provider-default and the project's response :mime_type default both resolve to PNG.
  • Response decode. gpt-image-1 always returns b64_json per image. For :binary callers the adapter Base64-decodes server-side; for :base64 callers the b64 is forwarded verbatim. :url callers are rejected pre-flight (Decision #6).
  • Response :mime_type. Driven by request.options[:output_format] via mime_type_for_output_format/1: "png"|:png"image/png", "jpeg"|:jpeg|:jpg"image/jpeg", "webp"|:webp"image/webp", absent → "image/png".
  • Token usage. ImageUsage.input_tokens / output_tokens from body.usage; ImageUsage.images = length(data) as elsewhere. body.usage.input_tokens_details (when present) lands on response.metadata[:usage_details] without overwriting caller keys.

Multipart vs JSON dispatch (Decision #7)

The :generate operation uses an application/json body via Req.new(json: ...) and OpenAIHeaders.json_headers/2. The :edit and :variation operations require an actual image upload, so they use multipart/form-data via Req.new(form_multipart: ...) and OpenAIHeaders.multipart_headers/2 (which elides content-type so Req's :form_multipart step stamps it with the boundary).

Both paths flow through the same Retry.run/3 integration and share the same decode_response/4 and to_image_adapter_error/4 helpers — the response shape is identical to :generate (a data: [...] array of url/b64_json items, optional usage on gpt-image-1).

URL-source resolution (Decision #8)

:edit / :variation requests carrying Image.from_url/1 images are eagerly fetched at request-build time. The Req.get/2 call honors a 30 s default receive_timeout (override via opts[:request_timeout]), a 5-redirect cap, and a 25 MB body-size cap. Failure modes (closed):

  • Non-2xx HTTP status:invalid_request with metadata: %{url: u, status: status}.
  • Non-image content-type (must prefix-match ~r{^image/(png|jpeg|jpg|webp|gif)$}) → :invalid_request with metadata: %{url: u, content_type: ct}.
  • Body > 25 MB:invalid_request with metadata: %{url: u, size: bytes}.
  • More than 5 redirects (Req.TooManyRedirectsError) → :invalid_request with metadata: %{url: u}.
  • Timeout / network error (Req.TransportError, etc.) → :network_error with metadata: %{url: u} and the underlying exception on :cause.

URL fetches are stubbable in tests via Req.Test.stub/2; pass the same adapter_opts: [plug: {Req.Test, stub}] used for the API stub and the fetch will route through the stub.

Test-injection escape hatch

Per Decision #20, generate/2 honors opts[:adapter_opts][:image_script] as a documented test-only short- circuit: when present, the call delegates to ALLM.Providers.FakeImages.generate/2 BEFORE any pre-flight gate runs and returns its result verbatim. This is the same pattern ALLM.Test.ImageAdapterConformance uses to script adapters under test. Production callers do not populate this key.

Retry integration

HTTP error closures return {:retry, delay_ms, error} for 429 + Retry-After, 5xx, and timeouts; ALLM.Retry.run/3 is wired against the engine's policy. Phase 14.3 augmented the retry vocabulary with the four image-error atoms (:rate_limited, :provider_unavailable, :timeout, :network_error) at the façade call site.

URL-mode expiry warning

Per Decision #5, OpenAI documents that image URLs returned via response_format: :url expire ~60 minutes after creation. Callers persisting Image{source: {:url, _}} beyond that window should download the bytes themselves before persisting, or request :base64 / :binary upfront. The adapter does NOT proactively materialize URL-mode responses to bytes.

Closed-enum mapping table caveat

:context_length_exceeded is reserved in ImageAdapterError's closed enum but is NOT actively mapped — long-prompt rejections from OpenAI surface as :invalid_request per Decision #21. :unsupported_feature is not produced by this adapter (Decision #22).

Summary

Functions

Return the OpenAI endpoint path (relative to the API base URL) for an image operation.

Execute an image-generation request synchronously against OpenAI.

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Resolve an %Image{} source to raw bytes for multipart upload.

Return the per-module union of operations the adapter can ever perform.

Build a multipart/form-data field list for :edit / :variation.

Functions

endpoint_for(atom)

(since 0.3.0)
@spec endpoint_for(ALLM.ImageRequest.operation()) :: String.t()

Return the OpenAI endpoint path (relative to the API base URL) for an image operation.

Examples

iex> ALLM.Providers.OpenAI.Images.endpoint_for(:generate)
"/images/generations"

iex> ALLM.Providers.OpenAI.Images.endpoint_for(:edit)
"/images/edits"

iex> ALLM.Providers.OpenAI.Images.endpoint_for(:variation)
"/images/variations"

generate(request, opts)

@spec generate(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, ALLM.ImageResponse.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Execute an image-generation request synchronously against OpenAI.

Pre-flight gates

Before any HTTP I/O, generate/2 checks four gates in order (per Invariant 1 of the design):

  1. Operation gate. request.operation in supported_operations(). Failure → :unsupported_operation.
  2. Model gate. When request.model is in the known matrix (dall-e-2, dall-e-3, gpt-image-1), the operation must be allowed for that model. Failure → :unsupported_operation with metadata: %{operation: op, model: model}. Unknown models fall through.
  3. gpt-image-1 + :url rejection. When request.model == "gpt-image-1" and request.response_format == :url, the request is rejected with :invalid_request because gpt-image-1 only returns base64 (Decision #6).
  4. URL-source resolution:edit / :variation requests with {:url, _} source images are eagerly fetched. Not implemented yet (lands with the multipart body builder).

Key resolution (ALLM.Keys.fetch!/2) runs AFTER the gates per Invariant 2 — a request that's going to be rejected pre-flight does not require a valid API key.

Adapter-injected defaults

When request.size is nil, the adapter OMITS the size field from the wire body and lets OpenAI apply its server-side default ("1024x1024" for dall-e-3 / gpt-image-1; "1024x1024" for dall-e-2). Per the wire-field map row, nil → omit. Other size shapes encode as: {w, h}"<w>x<h>"; :auto"auto"; binary → passthrough.

Test-injection short-circuit (Decision #20 / Invariant 0)

When opts[:adapter_opts][:image_script] is non-nil, generate/2 delegates to ALLM.Providers.FakeImages.generate/2 with the same opts BEFORE any pre-flight gate runs. This is the documented test-only escape hatch the conformance suite uses; production callers do not populate this key.

Response-format normalization (Decision #5)

The provider response carries either url: or b64_json: per image. The adapter materializes the caller's requested form:

  • caller asked :url → response carries {:url, url} source.
  • caller asked :base64 → response carries {:base64, b64} source.
  • caller asked :binary → adapter Base64-decodes the b64_json field server-side and produces {:binary, bytes} source.

For dall-e-2 / dall-e-3 the response :mime_type defaults to "image/png". For gpt-image-1 the MIME type is driven by request.options[:output_format] per Decision #19: "png"|:png"image/png", "jpeg"|:jpeg|:jpg"image/jpeg", "webp"|:webp"image/webp". When :output_format is absent the default is "image/png" (matching OpenAI's server-side default). The adapter OMITS the output_format field from the wire body when :output_format is absent and lets the provider default apply.

Request-id preservation (Invariant 3)

opts[:request_id] is reflected onto response.request_id unchanged. The OpenAI response's x-request-id header is preserved separately on response.metadata[:openai_request_id] (Decision #18).

Retry contract

Wraps the HTTP call in ALLM.Retry.run(opts[:retry] || :default, ...). The closure parses Retry-After (seconds form), returns {:retry, delay_ms, error} for 429/5xx/:timeout/:network_error, {:ok, response} for 2xx, and {:error, error} for everything else.

prepare_request(request, opts)

@spec prepare_request(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Mirrors the chat-adapter prepare_request/2 shape at lib/allm/providers/openai.ex:411-435. The :generate, :edit, and :variation operations are all supported; :variation shares the multipart machinery with :edit (variation drops prompt / mask).

When opts[:adapter_opts][:image_script] is set, prepare_request/2 returns the same stub error rather than delegating to FakeImages — the script path has no Req.Request analogue, so prepare_request/2 intentionally diverges from generate/2 (which DOES delegate to FakeImages.generate/2 under the script key per Invariant 0).

resolve_image_bytes(image, opts)

(since 0.3.0)
@spec resolve_image_bytes(
  ALLM.Image.t(),
  keyword()
) ::
  {:ok, binary(), String.t(), String.t()}
  | {:error, ALLM.Error.ImageAdapterError.t()}

Resolve an %Image{} source to raw bytes for multipart upload.

Returns {:ok, bytes, mime_type, filename} on success or a typed %ImageAdapterError{} for URL-source failures (non-2xx, non-image content-type, oversized body, too many redirects, timeout / network error) and base64 / file decode failures.

Filename is always "image.png" for {:binary, _} / {:base64, _} / {:url, _} sources (OpenAI ignores the filename for content-type resolution); {:file, path} sources use Path.basename(path) so the uploaded filename matches the local file for human readability.

supported_operations()

@spec supported_operations() :: [:generate | :edit | :variation]

Return the per-module union of operations the adapter can ever perform.

Per Phase 14.1 Decision #3, this is a per-MODULE list, not a per-call function of model. Per-model gating lives in gate_model_op/2.

Examples

iex> ALLM.Providers.OpenAI.Images.supported_operations()
[:generate, :edit, :variation]

to_multipart_body(request, opts)

(since 0.3.0)
@spec to_multipart_body(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, [{String.t(), term()}]} | {:error, ALLM.Error.ImageAdapterError.t()}

Build a multipart/form-data field list for :edit / :variation.

Returns {:ok, [{name, content}, ...]} ready to hand to Req.new(..., form_multipart: form). Plain fields are {name, value} 2-tuples; file fields (:image, :mask) use Req's {body, opts} shape: {name, {bytes, filename: "image.png", content_type: <mime>}}. See deps/req/lib/req/steps.ex:446-468 for the encoding contract.

All fields are emitted as strings (multipart fields are always strings on the wire); integer / atom values are stringified.

URL-source images on :edit / :variation are eagerly fetched per Decision #8. Failure modes (closed): non-2xx, non-image content-type, body > 25 MB, > 5 redirects, timeout / network error. Each maps to a typed %ImageAdapterError{} with metadata describing the URL and the failure detail. The Req.get/2 call honors opts[:adapter_opts][:plug] so URL fetches are stubbable in tests via Req.Test.stub/2.