ALLM.Runner (allm v0.3.0)

Copy Markdown View Source

Internal — use ALLM.generate/3 instead. See spec §17.

Layer C — stateless non-streaming entry point. run/3 delegates to ALLM.StreamRunner.run/3, folds the returned stream through ALLM.StreamCollector, and wraps the final %Response{} in {:ok, _}.

v0.2 — generate/3 always streams under the hood

In v0.2 every public Layer-C entry point (ALLM.generate/3, ALLM.step/3, ALLM.chat/3) routes through this module which then delegates to ALLM.StreamRunner.run/3. The non-streaming public API is therefore a stream-collector reduction of the streaming path; the adapter's c:ALLM.Adapter.generate/3 callback is never invoked from the public façade in v0.2. Consequence: ALLM.Retry's [:allm, :adapter, :retry] telemetry — which spec §6.1 prohibits on streaming calls — does not fire from ALLM.generate/3. The retry surface is exercised in v0.2 by direct adapter calls (ALLM.Providers.Fake.generate/2); real-provider Phase 10/11 adapters reuse ALLM.Retry.run/3 from their generate/2 callbacks. See ALLM.Retry @moduledoc for the full caveat and review Finding #3.

Stream-first (spec §3)

Non-streaming generation is a reducer over the streaming path:

{:ok, stream} = ALLM.StreamRunner.run(engine, request, opts)
stream
|> Enum.reduce(ALLM.StreamCollector.new(), &ALLM.StreamCollector.apply_event(&2, &1))
|> ALLM.StreamCollector.to_response()

This is the same algorithm consumers can run manually against ALLM.stream_generate/3; it exists here so generate/3 has one canonical code path and stream-equivalence (spec §3's first consequence) is preserved by construction.

Pre-flight vs. mid-stream errors (Non-obvious Decision #4)

  • Pre-flightStreamRunner.run/3 returns {:error, struct} synchronously (no stream opened). run/3 bubbles the error up verbatim.
  • Mid-stream — the adapter opened a stream and then emitted a terminal {:error, struct} event. StreamCollector.apply_event/2 folds the error into %Response{finish_reason: :error, metadata: %{error: struct}}. run/3 still returns {:ok, response} — the caller inspects response.finish_reason == :error to detect it.

Usage carve-out

StreamRunner.run/3's include_raw_chunks: false filter preserves {:raw_chunk, {:usage, _}} events regardless of the caller's filter preference (Non-obvious Decision #9), so the collector always sees usage and populates response.usage — no Runner-side override needed.

Summary

Functions

Dispatch a non-streaming request by reducing the streaming adapter's output via ALLM.StreamCollector.

Functions

run(engine, request, opts \\ [])

Dispatch a non-streaming request by reducing the streaming adapter's output via ALLM.StreamCollector.

Returns {:ok, %Response{}} on a successfully-completed stream (a mid-stream {:error, _} still returns {:ok, _} with response.finish_reason == :error — see module doc) or {:error, struct} on a synchronous pre-flight failure.

Examples

iex> engine = ALLM.Engine.new(
...>   adapter: ALLM.Providers.Fake,
...>   adapter_opts: [script: [{:text, "hi"}, {:finish, :stop}]]
...> )
iex> req = ALLM.request([ALLM.user("say hi")])
iex> {:ok, response} = ALLM.Runner.run(engine, req)
iex> {response.output_text, response.finish_reason}
{"hi", :stop}