Internal — use ALLM.generate/3 instead. See spec §17.
Layer C — stateless non-streaming entry point. run/3 delegates to
ALLM.StreamRunner.run/3, folds the returned stream through
ALLM.StreamCollector, and wraps the final %Response{} in
{:ok, _}.
v0.2 — generate/3 always streams under the hood
In v0.2 every public Layer-C entry point (ALLM.generate/3,
ALLM.step/3, ALLM.chat/3) routes through this module which then
delegates to ALLM.StreamRunner.run/3. The non-streaming public API
is therefore a stream-collector reduction of the streaming path; the
adapter's c:ALLM.Adapter.generate/3 callback is never invoked
from the public façade in v0.2. Consequence: ALLM.Retry's
[:allm, :adapter, :retry] telemetry — which spec §6.1 prohibits on
streaming calls — does not fire from ALLM.generate/3. The retry
surface is exercised in v0.2 by direct adapter calls
(ALLM.Providers.Fake.generate/2); real-provider Phase 10/11
adapters reuse ALLM.Retry.run/3 from their generate/2
callbacks. See ALLM.Retry @moduledoc for the full caveat and
review Finding #3.
Stream-first (spec §3)
Non-streaming generation is a reducer over the streaming path:
{:ok, stream} = ALLM.StreamRunner.run(engine, request, opts)
stream
|> Enum.reduce(ALLM.StreamCollector.new(), &ALLM.StreamCollector.apply_event(&2, &1))
|> ALLM.StreamCollector.to_response()This is the same algorithm consumers can run manually against
ALLM.stream_generate/3; it exists here so generate/3 has one
canonical code path and stream-equivalence (spec §3's first consequence)
is preserved by construction.
Pre-flight vs. mid-stream errors (Non-obvious Decision #4)
- Pre-flight —
StreamRunner.run/3returns{:error, struct}synchronously (no stream opened).run/3bubbles the error up verbatim. - Mid-stream — the adapter opened a stream and then emitted a
terminal
{:error, struct}event.StreamCollector.apply_event/2folds the error into%Response{finish_reason: :error, metadata: %{error: struct}}.run/3still returns{:ok, response}— the caller inspectsresponse.finish_reason == :errorto detect it.
Usage carve-out
StreamRunner.run/3's include_raw_chunks: false filter preserves
{:raw_chunk, {:usage, _}} events regardless of the caller's filter
preference (Non-obvious Decision #9), so the collector always sees
usage and populates response.usage — no Runner-side override needed.
Summary
Functions
Dispatch a non-streaming request by reducing the streaming adapter's
output via ALLM.StreamCollector.
Functions
@spec run(ALLM.Engine.t(), ALLM.Request.t(), keyword()) :: {:ok, ALLM.Response.t()} | {:error, ALLM.Error.EngineError.t() | ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Dispatch a non-streaming request by reducing the streaming adapter's
output via ALLM.StreamCollector.
Returns {:ok, %Response{}} on a successfully-completed stream (a
mid-stream {:error, _} still returns {:ok, _} with
response.finish_reason == :error — see module doc) or
{:error, struct} on a synchronous pre-flight failure.
Examples
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "hi"}, {:finish, :stop}]]
...> )
iex> req = ALLM.request([ALLM.user("say hi")])
iex> {:ok, response} = ALLM.Runner.run(engine, req)
iex> {response.output_text, response.finish_reason}
{"hi", :stop}