ALLM.Providers.OpenAI (allm v0.3.0)

Copy Markdown View Source

OpenAI provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1.

Implements both OpenAI HTTP endpoints:

  • generate/2 — fires POST /v1/chat/completions OR POST /v1/responses via Req, wrapped in ALLM.Retry.run/3 for 429/5xx retries with Retry-After parsing.
  • prepare_request/2 — returns an unfired %Req.Request{} with the API key already injected as Authorization: Bearer <key>.
  • translate_options/2 — endpoint-aware :max_tokens rename per design Decision #6 (:max_completion_tokens for gpt-4o*/gpt-4.1*/gpt-5* on Chat Completions, :max_output_tokens on Responses, passthrough for older models). Also handles reasoning controls per Decision #5.
  • requires_structured_finalize?/1 — capability declaration consumed by ALLM.Capability.preflight/2 (Decision #14); returns true when a request combines tools and a json_schema response_format.

Endpoint dispatch (Decision #1)

dispatch_endpoint/2 selects between :chat_completions and :responses by (in order): explicit opts[:endpoint], explicit adapter_opts[:endpoint], the @endpoint_dispatch model-family regex table (gpt-5* and o[1-9]*:responses; gpt-4*/gpt-3.5*:chat_completions), and a default fallback of :chat_completions.

Phase 10.6 lifts the prior unsupported-feature guard for :responses; gpt-5* and o-series models now route to the Responses API end-to-end.

Reasoning controls (Decision #5)

:reasoning_effort (:none | :low | :medium | :high | :xhigh), :reasoning_summary (:auto | :concise | :detailed), and :verbosity (:low | :medium | :high) are routed by translate_options/2:

  • On :responses: nested under reasoning: %{effort: ..., summary: ...} (effort + summary share one sub-map); verbosity: passes through as a bare key.
  • On :chat_completions for gpt-5*: :reasoning_effort and :verbosity pass through as bare keys; :reasoning_summary is stripped (Chat Completions does not surface it).
  • On :chat_completions for non-reasoning models: reasoning keys are silently stripped with a Logger.debug/1 line.

Unknown effort/summary/verbosity atoms raise ArgumentError.

Status mapping for Responses API (Decision #19)

Responses statusincomplete_details.reasonResponse.finish_reason
"completed"n/a:stop
"incomplete""max_output_tokens":length
"incomplete""content_filter":content_filter
"incomplete"other:other

When status is "incomplete", the raw reason is preserved on Response.metadata.incomplete_details.reason. Response.metadata.reasoning carries effort / summary from the response body's reasoning block.

Key resolution

Keys never appear on the engine. prepare_request/2 and generate/2 call ALLM.Keys.fetch!(:openai, opts) at request-build time per spec §6.4. Per design Decision #16, prepare_request/2 raises %ALLM.Error.EngineError{reason: :missing_key} when no key resolver yields a value — a programmer error best surfaced loudly rather than threaded through every with chain.

Retry contract

generate/2 wraps the HTTP call in ALLM.Retry.run(opts[:retry] || :default, …). The closure parses Retry-After (both seconds and HTTP-date formats), returns {:retry, delay_ms, error} for 429/5xx/:timeout, {:ok, response} for 2xx, and {:error, error} for everything else (e.g. 4xx that aren't rate-limit). Streaming does NOT retry per spec §6.1.

Finch transport defaults

Streaming (Phase 10.3) uses Finch.async_request/3 against the singleton ALLM.Finch started by ALLM.Application with protocol: :http1 per spec §7.2. Engines that want a custom Finch ref inject via adapter_opts: [finch_name: MyApp.Finch].

Capability declarations

requires_structured_finalize?/1 returns true when a request combines tools != [] AND response_format = %{type: :json_schema, ...} — OpenAI's API does not support that combination natively, so ALLM.Capability.preflight/2 rewrites the request with structured_finalize: true and ALLM.Chat.run/3 runs a two-pass tool loop + final-shape pass (Phase 10.4).

response_format translation

to_openai_response_format/2 (called from to_openai_request_body/3) translates the canonical %Request{}.response_format to OpenAI's wire shape. Per design Decision #17, the encoding is endpoint-aware:

ALLM canonical:chat_completions wire:responses wire
nilomitted (nil)omitted (nil)
:textomitted (nil){:text, %{format: %{type: "text"}}}
%{type: :json_object}{:response_format, %{type: "json_object"}}{:text, %{format: %{type: "json_object"}}}
%{type: :json_schema, name:, schema:, strict:}{:response_format, %{type: "json_schema", json_schema: %{name:, schema:, strict:}}}{:text, %{format: %{type: "json_schema", name:, schema:, strict:}}}

The function returns either nil (omit the field) OR a {wire_key, wire_value} 2-tuple where wire_key is the JSON body key the caller must merge into the request body (:response_format for Chat Completions; :text for Responses). See spec §5.4.

Summary

Types

Endpoint atom; chosen by dispatch_endpoint/2.

Functions

Resolve the endpoint for a model + opts pair (Decision #1).

Execute a non-streaming OpenAI Chat Completions request synchronously.

Build an unfired %Req.Request{} with the resolved API key injected as Authorization: Bearer <key> (Decision #16).

Capability declaration consumed by ALLM.Capability.preflight/2 (Decision #14).

Open a streaming Chat Completions request against the OpenAI provider.

Endpoint-aware translation of a canonical response_format shape to OpenAI's wire format. See spec §5.4 and design Decision #17.

Endpoint-aware translation of caller opts to OpenAI wire keys.

Types

endpoint()

@type endpoint() :: :responses | :chat_completions

Endpoint atom; chosen by dispatch_endpoint/2.

Functions

dispatch_endpoint(model, opts)

@spec dispatch_endpoint(
  String.t() | nil,
  keyword()
) :: endpoint()

Resolve the endpoint for a model + opts pair (Decision #1).

Resolution order:

  1. Explicit opts[:endpoint] (if :responses or :chat_completions).
  2. Explicit adapter_opts[:endpoint] (same shape).
  3. @endpoint_dispatch regex table — first match wins.
  4. Default fallback: :chat_completions.

Examples

iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", [])
:chat_completions

iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", [])
:responses

iex> ALLM.Providers.OpenAI.dispatch_endpoint("o3", [])
:responses

iex> ALLM.Providers.OpenAI.dispatch_endpoint(nil, [])
:chat_completions

iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", endpoint: :responses)
:responses

iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", adapter_opts: [endpoint: :chat_completions])
:chat_completions

generate(request, opts)

Execute a non-streaming OpenAI Chat Completions request synchronously.

Wraps the HTTP call in ALLM.Retry.run/3; the closure parses Retry-After headers and returns {:retry, delay_ms, error} for 429/5xx/:timeout. Returns {:ok, %Response{}} on 2xx success or {:error, %AdapterError{}} on every failure shape.

Routes models matching gpt-5* or o[1-9]* to the Responses API (POST /v1/responses); other models route to Chat Completions (POST /v1/chat/completions). Both endpoints return canonical %Response{} shapes so callers do not need to know which wire ran.

Vision input (Phase 17.1)

[%ALLM.TextPart{}, %ALLM.ImagePart{}] content lists translate to OpenAI's content-block wire shape automatically. URL-source images pass through verbatim; binary/base64/file sources resolve to a data:<mime>;base64,... URI via ALLM.Image.to_data_uri/1. ImagePart.detail (:auto | :low | :high) maps to the wire string via Atom.to_string/1 and is always emitted (Decision #7 Q2). System messages remain text-only — an %ImagePart{} in a system role is hard-rejected as %ValidationError{reason: :invalid_message} before any HTTP call. Per-image MIME / 20 MB size validation runs in pre-flight via ALLM.Providers.Support.ImageMime.

Examples

iex> ALLM.Keys.put(:openai, "sk-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...>   ALLM.Providers.OpenAI.generate(req,
...>     retry: false,
...>     adapter_opts: [plug: fn conn ->
...>       conn
...>       |> Plug.Conn.put_resp_content_type("application/json")
...>       |> Plug.Conn.resp(401, ~s({"error":{"message":"bad"}}))
...>     end]
...>   )
iex> ALLM.Keys.delete(:openai)
:ok

iex> # Vision pre-flight rejects an ImagePart in a system message.
iex> img = ALLM.Image.from_url("https://example.com/x.png")
iex> sys = %ALLM.Message{role: :system, content: [%ALLM.ImagePart{image: img}]}
iex> req = ALLM.Request.new([sys, %ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
iex> {:error, %ALLM.Error.ValidationError{reason: :invalid_message}} =
...>   ALLM.Providers.OpenAI.generate(req, api_key: "sk-x")
iex> :ok
:ok

prepare_request(request, opts)

@spec prepare_request(
  ALLM.Request.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}

Build an unfired %Req.Request{} with the resolved API key injected as Authorization: Bearer <key> (Decision #16).

Per design Decision #16: this function raises %ALLM.Error.EngineError{reason: :missing_key} when no key resolver yields a value (via ALLM.Keys.fetch!/2). Returns {:error, %AdapterError{}} only for non-key failures (e.g. an o-series model routed to :responses).

Examples

iex> ALLM.Keys.put(:openai, "sk-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.OpenAI.prepare_request(req, [])
iex> {Req.Request.get_header(http, "authorization"), http.url.path}
{["Bearer sk-doctest-prep"], "/v1/chat/completions"}
iex> ALLM.Keys.delete(:openai)
:ok

requires_structured_finalize?(request)

@spec requires_structured_finalize?(ALLM.Request.t()) :: boolean()

Capability declaration consumed by ALLM.Capability.preflight/2 (Decision #14).

Returns true when a request combines tools and a json_schema response format — the only combination that requires the structured-finalize two-pass dance (Phase 10.4).

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}])
iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
false

iex> tool = ALLM.Tool.new(name: "t", description: "d", schema: %{})
iex> rf = %{type: :json_schema, name: "p", schema: %{}, strict: true}
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], tools: [tool], response_format: rf)
iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
true

stream(request, opts)

Open a streaming Chat Completions request against the OpenAI provider.

Returns {:ok, lazy_enumerable} on successful pre-flight; the underlying Finch.async_request/3 does NOT fire until the consumer reduces. Returns {:error, %AdapterError{}} synchronously when pre-flight fails (key missing, o-series model, invalid request, etc.). Streaming never wraps in ALLM.Retry.run/3 per spec §6.1 — partial output may already have been delivered before any failure surfaces.

Per CLAUDE.md and spec §10.1, mid-stream failures emit a terminal {:error, _} event into the enumerable; the consumer's reducer (typically ALLM.StreamCollector) folds it into Response.finish_reason: :error. The call-site tuple stays {:ok, stream}.

Event sequence

Happy-path streams emit, in order:

{:message_started, %{message: %ALLM.Message{role: :assistant, content: ""}}}
{:text_delta, %{id: id, delta: "..."}}      # one or more
{:tool_call_delta, %{...}}                  # zero or more (interleaved with text)
{:tool_call_completed, %{...}}              # one per tool call (synthesized at stream end)
{:message_completed, %{message: msg, finish_reason: reason}}

The leading :message_started is a bookend — ALLM.StreamCollector folds it as a no-op. Mid-stream errors append a terminal {:error, _} event in place of (or after) :message_completed.

Options

  • :api_key / :adapter_opts[:plug] — see prepare_request/2.
  • :stream_timeout — milliseconds to wait between consecutive Finch messages. Default 60_000. Exceeding it emits a terminal {:error, %AdapterError{reason: :timeout}} event.
  • :finch_name — the registered Finch name (default ALLM.Finch).
  • :finch_module — the module used to call async_request/3 and cancel_async_request/1. Defaults to Finch. Tests inject ALLM.Test.FinchStub here.
  • :finch_stub_ref — when :finch_module is ALLM.Test.FinchStub, this ref selects the per-test stub state.

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> {:ok, stream} = ALLM.Providers.OpenAI.stream(req, api_key: "sk-x")
iex> match?(%Stream{}, stream)
true

to_openai_response_format(arg1, arg2)

@spec to_openai_response_format(endpoint(), ALLM.Request.response_format()) ::
  {atom(), map()} | nil

Endpoint-aware translation of a canonical response_format shape to OpenAI's wire format. See spec §5.4 and design Decision #17.

Returns either nil (omit the field on the wire) OR a {wire_key, wire_value} 2-tuple where wire_key is the JSON body key to merge into the request body (:response_format on Chat Completions, :text on Responses).

Raises FunctionClauseError on any other canonical shape — defense in depth: ALLM.Validate.request/1 should have rejected the shape upstream.

Examples

iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, nil)
nil

iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, %{type: :json_object})
{:response_format, %{type: "json_object"}}

iex> rf = %{type: :json_schema, name: "g", schema: %{type: "object"}, strict: true}
iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, rf)
{:response_format, %{type: "json_schema", json_schema: %{name: "g", schema: %{type: "object"}, strict: true}}}

iex> ALLM.Providers.OpenAI.to_openai_response_format(:responses, :text)
{:text, %{format: %{type: "text"}}}

translate_options(opts, request)

@spec translate_options(
  keyword(),
  ALLM.Request.t()
) :: keyword()

Endpoint-aware translation of caller opts to OpenAI wire keys.

:max_tokens rename matrix (Decision #6)

EndpointModel regexOutput key
:responsesany:max_output_tokens

| :chat_completions | ~r/^gpt-(4o|4\.1|5)/ | :max_completion_tokens | | :chat_completions | anything else | :max_tokens (passthrough) |

Reasoning controls (Decision #5)

:reasoning_effort ([:none, :low, :medium, :high, :xhigh]), :reasoning_summary ([:auto, :concise, :detailed]), and :verbosity ([:low, :medium, :high]) are routed by endpoint:

  • :responses:reasoning_effort and :reasoning_summary merge into a single reasoning: %{effort: ..., summary: ...} sub-map; :verbosity passes through as verbosity: "<atom>".
  • :chat_completions for gpt-5*:reasoning_effort and :verbosity pass through as bare reasoning_effort: "<atom>" and verbosity: "<atom>". :reasoning_summary is stripped (Chat Completions does not surface it).
  • :chat_completions for non-reasoning models — all three keys are stripped with a Logger.debug/1 line.

Unknown effort/summary/verbosity atoms raise ArgumentError.

All other opts pass through unchanged.

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
[max_completion_tokens: 100]

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-3.5-turbo")
iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
[max_tokens: 100]

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-5.5")
iex> ALLM.Providers.OpenAI.translate_options([reasoning_effort: :medium], req)
[reasoning: %{effort: "medium"}]