# `ALLM.Providers.OpenAI`
[🔗](https://github.com/cykod/ALLM/blob/v0.3.0/lib/allm/providers/openai.ex#L1)

OpenAI provider adapter — Layer B. See spec §6.4, §7.1, §20, §32.1.

Implements both OpenAI HTTP endpoints:

  * `generate/2` — fires `POST /v1/chat/completions` OR `POST /v1/responses`
    via `Req`, wrapped in `ALLM.Retry.run/3` for 429/5xx retries with
    `Retry-After` parsing.
  * `prepare_request/2` — returns an unfired `%Req.Request{}` with the API
    key already injected as `Authorization: Bearer <key>`.
  * `translate_options/2` — endpoint-aware `:max_tokens` rename per design
    Decision #6 (`:max_completion_tokens` for `gpt-4o*`/`gpt-4.1*`/`gpt-5*`
    on Chat Completions, `:max_output_tokens` on Responses, passthrough for
    older models). Also handles reasoning controls per Decision #5.
  * `requires_structured_finalize?/1` — capability declaration consumed by
    `ALLM.Capability.preflight/2` (Decision #14); returns `true` when a
    request combines tools and a `json_schema` response_format.

## Endpoint dispatch (Decision #1)

`dispatch_endpoint/2` selects between `:chat_completions` and `:responses`
by (in order): explicit `opts[:endpoint]`, explicit
`adapter_opts[:endpoint]`, the `@endpoint_dispatch` model-family regex
table (`gpt-5*` and `o[1-9]*` → `:responses`; `gpt-4*`/`gpt-3.5*` →
`:chat_completions`), and a default fallback of `:chat_completions`.

Phase 10.6 lifts the prior unsupported-feature guard for `:responses`;
`gpt-5*` and o-series models now route to the Responses API end-to-end.

## Reasoning controls (Decision #5)

`:reasoning_effort` (`:none | :low | :medium | :high | :xhigh`),
`:reasoning_summary` (`:auto | :concise | :detailed`), and `:verbosity`
(`:low | :medium | :high`) are routed by `translate_options/2`:

  * On `:responses`: nested under `reasoning: %{effort: ..., summary: ...}`
    (effort + summary share one sub-map); `verbosity:` passes through as a
    bare key.
  * On `:chat_completions` for `gpt-5*`: `:reasoning_effort` and
    `:verbosity` pass through as bare keys; `:reasoning_summary` is
    stripped (Chat Completions does not surface it).
  * On `:chat_completions` for non-reasoning models: reasoning keys are
    silently stripped with a `Logger.debug/1` line.

Unknown effort/summary/verbosity atoms raise `ArgumentError`.

## Status mapping for Responses API (Decision #19)

| Responses status | `incomplete_details.reason` | `Response.finish_reason` |
|------------------|------------------------------|--------------------------|
| `"completed"`    | n/a                          | `:stop`                  |
| `"incomplete"`   | `"max_output_tokens"`        | `:length`                |
| `"incomplete"`   | `"content_filter"`           | `:content_filter`        |
| `"incomplete"`   | other                        | `:other`                 |

When status is `"incomplete"`, the raw reason is preserved on
`Response.metadata.incomplete_details.reason`. `Response.metadata.reasoning`
carries `effort` / `summary` from the response body's `reasoning` block.

## Key resolution

Keys never appear on the engine. `prepare_request/2` and `generate/2` call
`ALLM.Keys.fetch!(:openai, opts)` at request-build time per spec §6.4.
Per design Decision #16, `prepare_request/2` raises
`%ALLM.Error.EngineError{reason: :missing_key}` when no key resolver
yields a value — a programmer error best surfaced loudly rather than
threaded through every `with` chain.

## Retry contract

`generate/2` wraps the HTTP call in `ALLM.Retry.run(opts[:retry] || :default, …)`.
The closure parses `Retry-After` (both seconds and HTTP-date formats),
returns `{:retry, delay_ms, error}` for 429/5xx/`:timeout`, `{:ok, response}`
for 2xx, and `{:error, error}` for everything else (e.g. 4xx that aren't
rate-limit). Streaming does NOT retry per spec §6.1.

## Finch transport defaults

Streaming (Phase 10.3) uses `Finch.async_request/3` against the singleton
`ALLM.Finch` started by `ALLM.Application` with `protocol: :http1` per
spec §7.2. Engines that want a custom Finch ref inject via
`adapter_opts: [finch_name: MyApp.Finch]`.

## Capability declarations

`requires_structured_finalize?/1` returns `true` when a request combines
`tools != []` AND `response_format = %{type: :json_schema, ...}` —
OpenAI's API does not support that combination natively, so
`ALLM.Capability.preflight/2` rewrites the request with
`structured_finalize: true` and `ALLM.Chat.run/3` runs a two-pass tool
loop + final-shape pass (Phase 10.4).

## response_format translation

`to_openai_response_format/2` (called from `to_openai_request_body/3`)
translates the canonical `%Request{}.response_format` to OpenAI's wire
shape. Per design Decision #17, the encoding is endpoint-aware:

| ALLM canonical | `:chat_completions` wire | `:responses` wire |
|----------------|--------------------------|-------------------|
| `nil` | omitted (`nil`) | omitted (`nil`) |
| `:text` | omitted (`nil`) | `{:text, %{format: %{type: "text"}}}` |
| `%{type: :json_object}` | `{:response_format, %{type: "json_object"}}` | `{:text, %{format: %{type: "json_object"}}}` |
| `%{type: :json_schema, name:, schema:, strict:}` | `{:response_format, %{type: "json_schema", json_schema: %{name:, schema:, strict:}}}` | `{:text, %{format: %{type: "json_schema", name:, schema:, strict:}}}` |

The function returns either `nil` (omit the field) OR a
`{wire_key, wire_value}` 2-tuple where `wire_key` is the JSON body key
the caller must merge into the request body (`:response_format` for
Chat Completions; `:text` for Responses). See spec §5.4.

# `endpoint`

```elixir
@type endpoint() :: :responses | :chat_completions
```

Endpoint atom; chosen by `dispatch_endpoint/2`.

# `dispatch_endpoint`

```elixir
@spec dispatch_endpoint(
  String.t() | nil,
  keyword()
) :: endpoint()
```

Resolve the endpoint for a model + opts pair (Decision #1).

Resolution order:

  1. Explicit `opts[:endpoint]` (if `:responses` or `:chat_completions`).
  2. Explicit `adapter_opts[:endpoint]` (same shape).
  3. `@endpoint_dispatch` regex table — first match wins.
  4. Default fallback: `:chat_completions`.

## Examples

    iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", [])
    :chat_completions

    iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", [])
    :responses

    iex> ALLM.Providers.OpenAI.dispatch_endpoint("o3", [])
    :responses

    iex> ALLM.Providers.OpenAI.dispatch_endpoint(nil, [])
    :chat_completions

    iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-4o", endpoint: :responses)
    :responses

    iex> ALLM.Providers.OpenAI.dispatch_endpoint("gpt-5.5", adapter_opts: [endpoint: :chat_completions])
    :chat_completions

# `generate`

```elixir
@spec generate(
  ALLM.Request.t(),
  keyword()
) ::
  {:ok, ALLM.Response.t()}
  | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
```

Execute a non-streaming OpenAI Chat Completions request synchronously.

Wraps the HTTP call in `ALLM.Retry.run/3`; the closure parses
`Retry-After` headers and returns `{:retry, delay_ms, error}` for
429/5xx/`:timeout`. Returns `{:ok, %Response{}}` on 2xx success or
`{:error, %AdapterError{}}` on every failure shape.

Routes models matching `gpt-5*` or `o[1-9]*` to the Responses API
(`POST /v1/responses`); other models route to Chat Completions
(`POST /v1/chat/completions`). Both endpoints return canonical
`%Response{}` shapes so callers do not need to know which wire ran.

## Vision input (Phase 17.1)

`[%ALLM.TextPart{}, %ALLM.ImagePart{}]` content lists translate to
OpenAI's content-block wire shape automatically. URL-source images
pass through verbatim; binary/base64/file sources resolve to a
`data:<mime>;base64,...` URI via `ALLM.Image.to_data_uri/1`.
`ImagePart.detail` (`:auto | :low | :high`) maps to the wire string
via `Atom.to_string/1` and is always emitted (Decision #7 Q2). System
messages remain text-only — an `%ImagePart{}` in a system role is
hard-rejected as `%ValidationError{reason: :invalid_message}` before
any HTTP call. Per-image MIME / 20 MB size validation runs in
pre-flight via `ALLM.Providers.Support.ImageMime`.

## Examples

    iex> ALLM.Keys.put(:openai, "sk-doctest-gen")
    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
    iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
    ...>   ALLM.Providers.OpenAI.generate(req,
    ...>     retry: false,
    ...>     adapter_opts: [plug: fn conn ->
    ...>       conn
    ...>       |> Plug.Conn.put_resp_content_type("application/json")
    ...>       |> Plug.Conn.resp(401, ~s({"error":{"message":"bad"}}))
    ...>     end]
    ...>   )
    iex> ALLM.Keys.delete(:openai)
    :ok

    iex> # Vision pre-flight rejects an ImagePart in a system message.
    iex> img = ALLM.Image.from_url("https://example.com/x.png")
    iex> sys = %ALLM.Message{role: :system, content: [%ALLM.ImagePart{image: img}]}
    iex> req = ALLM.Request.new([sys, %ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
    iex> {:error, %ALLM.Error.ValidationError{reason: :invalid_message}} =
    ...>   ALLM.Providers.OpenAI.generate(req, api_key: "sk-x")
    iex> :ok
    :ok

# `prepare_request`

```elixir
@spec prepare_request(
  ALLM.Request.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}
```

Build an unfired `%Req.Request{}` with the resolved API key injected as
`Authorization: Bearer <key>` (Decision #16).

Per design Decision #16: this function **raises**
`%ALLM.Error.EngineError{reason: :missing_key}` when no key resolver
yields a value (via `ALLM.Keys.fetch!/2`). Returns
`{:error, %AdapterError{}}` only for non-key failures (e.g. an o-series
model routed to `:responses`).

## Examples

    iex> ALLM.Keys.put(:openai, "sk-doctest-prep")
    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gpt-4o-mini")
    iex> {:ok, %Req.Request{} = http} = ALLM.Providers.OpenAI.prepare_request(req, [])
    iex> {Req.Request.get_header(http, "authorization"), http.url.path}
    {["Bearer sk-doctest-prep"], "/v1/chat/completions"}
    iex> ALLM.Keys.delete(:openai)
    :ok

# `requires_structured_finalize?`

```elixir
@spec requires_structured_finalize?(ALLM.Request.t()) :: boolean()
```

Capability declaration consumed by `ALLM.Capability.preflight/2`
(Decision #14).

Returns `true` when a request combines tools and a json_schema response
format — the only combination that requires the structured-finalize
two-pass dance (Phase 10.4).

## Examples

    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}])
    iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
    false

    iex> tool = ALLM.Tool.new(name: "t", description: "d", schema: %{})
    iex> rf = %{type: :json_schema, name: "p", schema: %{}, strict: true}
    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], tools: [tool], response_format: rf)
    iex> ALLM.Providers.OpenAI.requires_structured_finalize?(req)
    true

# `stream`

```elixir
@spec stream(
  ALLM.Request.t(),
  keyword()
) ::
  {:ok, Enumerable.t()}
  | {:error, ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
```

Open a streaming Chat Completions request against the OpenAI provider.

Returns `{:ok, lazy_enumerable}` on successful pre-flight; the underlying
`Finch.async_request/3` does NOT fire until the consumer reduces. Returns
`{:error, %AdapterError{}}` synchronously when pre-flight fails (key
missing, o-series model, invalid request, etc.). Streaming never wraps in
`ALLM.Retry.run/3` per spec §6.1 — partial output may already have been
delivered before any failure surfaces.

Per CLAUDE.md and spec §10.1, mid-stream failures emit a terminal
`{:error, _}` event into the enumerable; the consumer's reducer (typically
`ALLM.StreamCollector`) folds it into `Response.finish_reason: :error`.
The call-site tuple stays `{:ok, stream}`.

## Event sequence

Happy-path streams emit, in order:

    {:message_started, %{message: %ALLM.Message{role: :assistant, content: ""}}}
    {:text_delta, %{id: id, delta: "..."}}      # one or more
    {:tool_call_delta, %{...}}                  # zero or more (interleaved with text)
    {:tool_call_completed, %{...}}              # one per tool call (synthesized at stream end)
    {:message_completed, %{message: msg, finish_reason: reason}}

The leading `:message_started` is a bookend — `ALLM.StreamCollector` folds
it as a no-op. Mid-stream errors append a terminal `{:error, _}` event in
place of (or after) `:message_completed`.

## Options

  * `:api_key` / `:adapter_opts[:plug]` — see `prepare_request/2`.
  * `:stream_timeout` — milliseconds to wait between consecutive Finch
    messages. Default `60_000`. Exceeding it emits a terminal
    `{:error, %AdapterError{reason: :timeout}}` event.
  * `:finch_name` — the registered Finch name (default `ALLM.Finch`).
  * `:finch_module` — the module used to call `async_request/3` and
    `cancel_async_request/1`. Defaults to `Finch`. Tests inject
    `ALLM.Test.FinchStub` here.
  * `:finch_stub_ref` — when `:finch_module` is `ALLM.Test.FinchStub`,
    this ref selects the per-test stub state.

## Examples

    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
    iex> {:ok, stream} = ALLM.Providers.OpenAI.stream(req, api_key: "sk-x")
    iex> match?(%Stream{}, stream)
    true

# `to_openai_response_format`

```elixir
@spec to_openai_response_format(endpoint(), ALLM.Request.response_format()) ::
  {atom(), map()} | nil
```

Endpoint-aware translation of a canonical `response_format` shape to
OpenAI's wire format. See spec §5.4 and design Decision #17.

Returns either `nil` (omit the field on the wire) OR a
`{wire_key, wire_value}` 2-tuple where `wire_key` is the JSON body key
to merge into the request body (`:response_format` on Chat Completions,
`:text` on Responses).

Raises `FunctionClauseError` on any other canonical shape — defense in
depth: `ALLM.Validate.request/1` should have rejected the shape upstream.

## Examples

    iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, nil)
    nil

    iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, %{type: :json_object})
    {:response_format, %{type: "json_object"}}

    iex> rf = %{type: :json_schema, name: "g", schema: %{type: "object"}, strict: true}
    iex> ALLM.Providers.OpenAI.to_openai_response_format(:chat_completions, rf)
    {:response_format, %{type: "json_schema", json_schema: %{name: "g", schema: %{type: "object"}, strict: true}}}

    iex> ALLM.Providers.OpenAI.to_openai_response_format(:responses, :text)
    {:text, %{format: %{type: "text"}}}

# `translate_options`

```elixir
@spec translate_options(
  keyword(),
  ALLM.Request.t()
) :: keyword()
```

Endpoint-aware translation of caller opts to OpenAI wire keys.

## `:max_tokens` rename matrix (Decision #6)

| Endpoint | Model regex | Output key |
|----------|-------------|------------|
| `:responses` | any | `:max_output_tokens` |
| `:chat_completions` | `~r/^gpt-(4o|4\.1|5)/` | `:max_completion_tokens` |
| `:chat_completions` | anything else | `:max_tokens` (passthrough) |

## Reasoning controls (Decision #5)

`:reasoning_effort` (`[:none, :low, :medium, :high, :xhigh]`),
`:reasoning_summary` (`[:auto, :concise, :detailed]`), and
`:verbosity` (`[:low, :medium, :high]`) are routed by endpoint:

  * `:responses` — `:reasoning_effort` and `:reasoning_summary` merge into
    a single `reasoning: %{effort: ..., summary: ...}` sub-map; `:verbosity`
    passes through as `verbosity: "<atom>"`.
  * `:chat_completions` for `gpt-5*` — `:reasoning_effort` and
    `:verbosity` pass through as bare `reasoning_effort: "<atom>"` and
    `verbosity: "<atom>"`. `:reasoning_summary` is stripped (Chat
    Completions does not surface it).
  * `:chat_completions` for non-reasoning models — all three keys are
    stripped with a `Logger.debug/1` line.

Unknown effort/summary/verbosity atoms raise `ArgumentError`.

All other opts pass through unchanged.

## Examples

    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-4o-mini")
    iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
    [max_completion_tokens: 100]

    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-3.5-turbo")
    iex> ALLM.Providers.OpenAI.translate_options([max_tokens: 100], req)
    [max_tokens: 100]

    iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gpt-5.5")
    iex> ALLM.Providers.OpenAI.translate_options([reasoning_effort: :medium], req)
    [reasoning: %{effort: "medium"}]

---

*Consult [api-reference.md](api-reference.md) for complete listing*
