OpenAI provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1.
Implements both OpenAI HTTP endpoints:
generate/2— firesPOST /v1/chat/completionsORPOST /v1/responsesviaReq, wrapped inALLM.Retry.run/3for 429/5xx retries withRetry-Afterparsing.prepare_request/2— returns an unfired%Req.Request{}with the API key already injected asAuthorization: Bearer <key>.translate_options/2— endpoint-aware:max_tokensrename per design Decision #6 (:max_completion_tokensforgpt-4o*/gpt-4.1*/gpt-5*on Chat Completions,:max_output_tokenson Responses, passthrough for older models). Also handles reasoning controls per Decision #5.requires_structured_finalize?/1— capability declaration consumed byALLM.Capability.preflight/2(Decision #14); returnstruewhen a request combines tools and ajson_schemaresponse_format.
Endpoint dispatch (Decision #1)
dispatch_endpoint/2 selects between :chat_completions and :responses
by (in order): explicit opts[:endpoint], explicit
adapter_opts[:endpoint], the @endpoint_dispatch model-family regex
table (gpt-5* and o[1-9]* → :responses; gpt-4*/gpt-3.5* →
:chat_completions), and a default fallback of :chat_completions.
Phase 10.6 lifts the prior unsupported-feature guard for :responses;
gpt-5* and o-series models now route to the Responses API end-to-end.
Reasoning controls (Decision #5)
:reasoning_effort (:none | :low | :medium | :high | :xhigh),
:reasoning_summary (:auto | :concise | :detailed), and :verbosity
(:low | :medium | :high) are routed by translate_options/2:
- On
:responses: nested underreasoning: %{effort: ..., summary: ...}(effort + summary share one sub-map);verbosity:passes through as a bare key. - On
:chat_completionsforgpt-5*::reasoning_effortand:verbositypass through as bare keys;:reasoning_summaryis stripped (Chat Completions does not surface it). - On
:chat_completionsfor non-reasoning models: reasoning keys are silently stripped with aLogger.debug/1line.
Unknown effort/summary/verbosity atoms raise ArgumentError.
Status mapping for Responses API (Decision #19)
| Responses status | incomplete_details.reason | Response.finish_reason |
|---|---|---|
"completed" | n/a | :stop |
"incomplete" | "max_output_tokens" | :length |
"incomplete" | "content_filter" | :content_filter |
"incomplete" | other | :other |
When status is "incomplete", the raw reason is preserved on
Response.metadata.incomplete_details.reason. Response.metadata.reasoning
carries effort / summary from the response body's reasoning block.
Key resolution
Keys never appear on the engine. prepare_request/2 and generate/2 call
ALLM.Keys.fetch!(:openai, opts) at request-build time per spec §6.4.
Per design Decision #16, prepare_request/2 raises
%ALLM.Error.EngineError{reason: :missing_key} when no key resolver
yields a value — a programmer error best surfaced loudly rather than
threaded through every with chain.
Retry contract
generate/2 wraps the HTTP call in ALLM.Retry.run(opts[:retry] || :default, …).
The closure parses Retry-After (both seconds and HTTP-date formats),
returns {:retry, delay_ms, error} for 429/5xx/:timeout, {:ok, response}
for 2xx, and {:error, error} for everything else (e.g. 4xx that aren't
rate-limit). Streaming does NOT retry per spec §6.1.
Finch transport defaults
Streaming (Phase 10.3) uses Finch.async_request/3 against the singleton
ALLM.Finch started by ALLM.Application with protocol: :http1 per
spec §7.2. Engines that want a custom Finch ref inject via
adapter_opts: [finch_name: MyApp.Finch].
Capability declarations
requires_structured_finalize?/1 returns true when a request combines
tools != [] AND response_format = %{type: :json_schema, ...} —
OpenAI's API does not support that combination natively, so
ALLM.Capability.preflight/2 rewrites the request with
structured_finalize: true and ALLM.Chat.run/3 runs a two-pass tool
loop + final-shape pass (Phase 10.4).
response_format translation
to_openai_response_format/2 (called from to_openai_request_body/3)
translates the canonical %Request{}.response_format to OpenAI's wire
shape. Per design Decision #17, the encoding is endpoint-aware:
| ALLM canonical | :chat_completions wire | :responses wire |
|---|---|---|
nil | omitted (nil) | omitted (nil) |
:text | omitted (nil) | {:text, %{format: %{type: "text"}}} |
%{type: :json_object} | {:response_format, %{type: "json_object"}} | {:text, %{format: %{type: "json_object"}}} |
%{type: :json_schema, name:, schema:, strict:} | {:response_format, %{type: "json_schema", json_schema: %{name:, schema:, strict:}}} | {:text, %{format: %{type: "json_schema", name:, schema:, strict:}}} |
The function returns either nil (omit the field) OR a
{wire_key, wire_value} 2-tuple where wire_key is the JSON body key
the caller must merge into the request body (:response_format for
Chat Completions; :text for Responses). See spec §5.4.
Summary
Types
Endpoint atom; chosen by dispatch_endpoint/2.
Functions
Resolve the endpoint for a model + opts pair (Decision #1).
Execute a non-streaming OpenAI Chat Completions request synchronously.
Build an unfired %Req.Request{} with the resolved API key injected as
Authorization: Bearer <key> (Decision #16).
Capability declaration consumed by ALLM.Capability.preflight/2
(Decision #14).
Open a streaming Chat Completions request against the OpenAI provider.
Endpoint-aware translation of a canonical response_format shape to
OpenAI's wire format. See spec §5.4 and design Decision #17.
Endpoint-aware translation of caller opts to OpenAI wire keys.
Types
@type endpoint() :: :responses | :chat_completions
Endpoint atom; chosen by dispatch_endpoint/2.
Functions
Resolve the endpoint for a model + opts pair (Decision #1).
Resolution order:
- Explicit
opts[:endpoint](if:responsesor:chat_completions). - Explicit
adapter_opts[:endpoint](same shape). @endpoint_dispatchregex table — first match wins.- Default fallback:
:chat_completions.
Examples
iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", [])
:chat_completions
iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", [])
:responses
iex> ALLM.Providers.OpenAI.dispatch_endpoint("o3", [])
:responses
iex> ALLM.Providers.OpenAI.dispatch_endpoint(nil, [])
:chat_completions
iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", endpoint: :responses)
:responses
iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", adapter_opts: [endpoint: :chat_completions])
:chat_completions
@spec generate( ALLM.Request.t(), keyword() ) :: {:ok, ALLM.Response.t()} | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Execute a non-streaming OpenAI Chat Completions request synchronously.
Wraps the HTTP call in ALLM.Retry.run/3; the closure parses
Retry-After headers and returns {:retry, delay_ms, error} for
429/5xx/:timeout. Returns {:ok, %Response{}} on 2xx success or
{:error, %AdapterError{}} on every failure shape.
Routes models matching gpt-5* or o[1-9]* to the Responses API
(POST /v1/responses); other models route to Chat Completions
(POST /v1/chat/completions). Both endpoints return canonical
%Response{} shapes so callers do not need to know which wire ran.
Vision input (Phase 17.1)
[%ALLM.TextPart{}, %ALLM.ImagePart{}] content lists translate to
OpenAI's content-block wire shape automatically. URL-source images
pass through verbatim; binary/base64/file sources resolve to a
data:<mime>;base64,... URI via ALLM.Image.to_data_uri/1.
ImagePart.detail (:auto | :low | :high) maps to the wire string
via Atom.to_string/1 and is always emitted (Decision #7 Q2). System
messages remain text-only — an %ImagePart{} in a system role is
hard-rejected as %ValidationError{reason: :invalid_message} before
any HTTP call. Per-image MIME / 20 MB size validation runs in
pre-flight via ALLM.Providers.Support.ImageMime.
Examples
iex> ALLM.Keys.put(:openai, "sk-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...> ALLM.Providers.OpenAI.generate(req,
...> retry: false,
...> adapter_opts: [plug: fn conn ->
...> conn
...> |> Plug.Conn.put_resp_content_type("application/json")
...> |> Plug.Conn.resp(401, ~s({"error":{"message":"bad"}}))
...> end]
...> )
iex> ALLM.Keys.delete(:openai)
:ok
iex> # Vision pre-flight rejects an ImagePart in a system message.
iex> img = ALLM.Image.from_url("https://example.com/x.png")
iex> sys = %ALLM.Message{role: :system, content: [%ALLM.ImagePart{image: img}]}
iex> req = ALLM.Request.new([sys, %ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
iex> {:error, %ALLM.Error.ValidationError{reason: :invalid_message}} =
...> ALLM.Providers.OpenAI.generate(req, api_key: "sk-x")
iex> :ok
:ok
@spec prepare_request( ALLM.Request.t(), keyword() ) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}
Build an unfired %Req.Request{} with the resolved API key injected as
Authorization: Bearer <key> (Decision #16).
Per design Decision #16: this function raises
%ALLM.Error.EngineError{reason: :missing_key} when no key resolver
yields a value (via ALLM.Keys.fetch!/2). Returns
{:error, %AdapterError{}} only for non-key failures (e.g. an o-series
model routed to :responses).
Examples
iex> ALLM.Keys.put(:openai, "sk-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.OpenAI.prepare_request(req, [])
iex> {Req.Request.get_header(http, "authorization"), http.url.path}
{["Bearer sk-doctest-prep"], "/v1/chat/completions"}
iex> ALLM.Keys.delete(:openai)
:ok
@spec requires_structured_finalize?(ALLM.Request.t()) :: boolean()
Capability declaration consumed by ALLM.Capability.preflight/2
(Decision #14).
Returns true when a request combines tools and a json_schema response
format — the only combination that requires the structured-finalize
two-pass dance (Phase 10.4).
Examples
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}])
iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
false
iex> tool = ALLM.Tool.new(name: "t", description: "d", schema: %{})
iex> rf = %{type: :json_schema, name: "p", schema: %{}, strict: true}
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], tools: [tool], response_format: rf)
iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
true
@spec stream( ALLM.Request.t(), keyword() ) :: {:ok, Enumerable.t()} | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Open a streaming Chat Completions request against the OpenAI provider.
Returns {:ok, lazy_enumerable} on successful pre-flight; the underlying
Finch.async_request/3 does NOT fire until the consumer reduces. Returns
{:error, %AdapterError{}} synchronously when pre-flight fails (key
missing, o-series model, invalid request, etc.). Streaming never wraps in
ALLM.Retry.run/3 per spec §6.1 — partial output may already have been
delivered before any failure surfaces.
Per CLAUDE.md and spec §10.1, mid-stream failures emit a terminal
{:error, _} event into the enumerable; the consumer's reducer (typically
ALLM.StreamCollector) folds it into Response.finish_reason: :error.
The call-site tuple stays {:ok, stream}.
Event sequence
Happy-path streams emit, in order:
{:message_started, %{message: %ALLM.Message{role: :assistant, content: ""}}}
{:text_delta, %{id: id, delta: "..."}} # one or more
{:tool_call_delta, %{...}} # zero or more (interleaved with text)
{:tool_call_completed, %{...}} # one per tool call (synthesized at stream end)
{:message_completed, %{message: msg, finish_reason: reason}}The leading :message_started is a bookend — ALLM.StreamCollector folds
it as a no-op. Mid-stream errors append a terminal {:error, _} event in
place of (or after) :message_completed.
Options
:api_key/:adapter_opts[:plug]— seeprepare_request/2.:stream_timeout— milliseconds to wait between consecutive Finch messages. Default60_000. Exceeding it emits a terminal{:error, %AdapterError{reason: :timeout}}event.:finch_name— the registered Finch name (defaultALLM.Finch).:finch_module— the module used to callasync_request/3andcancel_async_request/1. Defaults toFinch. Tests injectALLM.Test.FinchStubhere.:finch_stub_ref— when:finch_moduleisALLM.Test.FinchStub, this ref selects the per-test stub state.
Examples
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> {:ok, stream} = ALLM.Providers.OpenAI.stream(req, api_key: "sk-x")
iex> match?(%Stream{}, stream)
true
@spec to_openai_response_format(endpoint(), ALLM.Request.response_format()) :: {atom(), map()} | nil
Endpoint-aware translation of a canonical response_format shape to
OpenAI's wire format. See spec §5.4 and design Decision #17.
Returns either nil (omit the field on the wire) OR a
{wire_key, wire_value} 2-tuple where wire_key is the JSON body key
to merge into the request body (:response_format on Chat Completions,
:text on Responses).
Raises FunctionClauseError on any other canonical shape — defense in
depth: ALLM.Validate.request/1 should have rejected the shape upstream.
Examples
iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, nil)
nil
iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, %{type: :json_object})
{:response_format, %{type: "json_object"}}
iex> rf = %{type: :json_schema, name: "g", schema: %{type: "object"}, strict: true}
iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, rf)
{:response_format, %{type: "json_schema", json_schema: %{name: "g", schema: %{type: "object"}, strict: true}}}
iex> ALLM.Providers.OpenAI.to_openai_response_format(:responses, :text)
{:text, %{format: %{type: "text"}}}
@spec translate_options( keyword(), ALLM.Request.t() ) :: keyword()
Endpoint-aware translation of caller opts to OpenAI wire keys.
:max_tokens rename matrix (Decision #6)
| Endpoint | Model regex | Output key |
|---|---|---|
:responses | any | :max_output_tokens |
| :chat_completions | ~r/^gpt-(4o|4\.1|5)/ | :max_completion_tokens |
| :chat_completions | anything else | :max_tokens (passthrough) |
Reasoning controls (Decision #5)
:reasoning_effort ([:none, :low, :medium, :high, :xhigh]),
:reasoning_summary ([:auto, :concise, :detailed]), and
:verbosity ([:low, :medium, :high]) are routed by endpoint:
:responses—:reasoning_effortand:reasoning_summarymerge into a singlereasoning: %{effort: ..., summary: ...}sub-map;:verbositypasses through asverbosity: "<atom>".:chat_completionsforgpt-5*—:reasoning_effortand:verbositypass through as barereasoning_effort: "<atom>"andverbosity: "<atom>".:reasoning_summaryis stripped (Chat Completions does not surface it).:chat_completionsfor non-reasoning models — all three keys are stripped with aLogger.debug/1line.
Unknown effort/summary/verbosity atoms raise ArgumentError.
All other opts pass through unchanged.
Examples
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
[max_completion_tokens: 100]
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-3.5-turbo")
iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
[max_tokens: 100]
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-5.5")
iex> ALLM.Providers.OpenAI.translate_options([reasoning_effort: :medium], req)
[reasoning: %{effort: "medium"}]