Streaming works with both built-in adapters:

  • Mint (default) — works out of the box, no extra dependencies needed. Note that Mint opens a new connection per request and does not pool connections.
  • Finch — also supported; use it if you need connection pooling or already have Finch in your stack.

To use Finch for streaming, add the dependency, start it in your supervision tree, and configure the adapter:

# mix.exs
{:finch, "~> 0.18"}

# application.ex
{Finch, name: MyApp.Finch}

# config/config.exs
config :llm_composer, :tesla_adapter, {Tesla.Adapter.Finch, name: MyApp.Finch}

Enable streaming by setting stream_response: true in your provider options. The response LlmResponse.t() will have its :stream field populated with an Enumerable of LlmComposer.StreamChunk structs.

Basic Usage

Application.put_env(:llm_composer, :google, api_key: "<your google api key>")

settings = %LlmComposer.Settings{
  providers: [
    {LlmComposer.Providers.Google, [model: "gemini-2.5-flash"]}
  ],
  system_prompt: "You are a helpful assistant.",
  stream_response: true
}

{:ok, res} = LlmComposer.run_completion(settings, [
  %LlmComposer.Message{type: :user, content: "How did the Roman Empire grow so big?"}
])

res.stream
|> LlmComposer.parse_stream_response(res.provider)
|> Enum.each(fn chunk ->
  IO.write(chunk.text || "")
end)

parse_stream_response/2

LlmComposer.parse_stream_response/2 normalizes the raw provider stream into %LlmComposer.StreamChunk{} values, making chunk handling consistent across providers:

res.stream
|> LlmComposer.parse_stream_response(res.provider)
|> Enum.each(fn chunk ->
  case chunk.type do
    :text_delta -> IO.write(chunk.text)
    :done -> IO.puts("\n[done]")
    :error -> IO.puts("\n[error] #{inspect(chunk.metadata)}")
    _ -> :ok
  end
end)

Key fields on each chunk:

FieldDescription
:providerSource provider (:open_ai, :google, :open_router, etc.)
:typeEvent category (see chunk types below)
:textIncremental text when available
:usageNormalized token counts when exposed by the provider
:rawOriginal decoded payload for advanced/debug handling

Token Tracking in Streaming Mode

When streaming is enabled, LlmComposer does not populate LlmResponse token fields (:input_tokens, :output_tokens, etc.) from the response. Two approaches:

  1. Calculate tokens externally — use a library like tiktoken for OpenAI-compatible providers before sending the request.
  2. Read from stream events — some providers (OpenRouter, OpenAI Responses) include token counts in their :usage or :done chunk events. Read chunk.usage when chunk.type == :usage.

StreamChunk Fields

LlmComposer.StreamChunk.t() carries all information about a single streaming event:

FieldTypeDescription
:provideratom()Provider that emitted this chunk
:typeatom()Event type (see below)
:textString.t() | nilText delta for this chunk
:reasoningString.t() | nilReasoning delta (reasoning models)
:reasoning_detailslist()Structured reasoning blocks
:tool_callslist()Partial tool call fragments
:usagemap() | nilToken counts when reported mid-stream
:cost_infoCostInfo.t() | nilCost info when :track_costs is enabled
:metadatamap()Provider-specific extra data
:rawany()Original decoded payload

Chunk Types

TypeDescription
:text_deltaIncremental text content
:reasoning_deltaIncremental reasoning content
:tool_call_deltaPartial tool/function call
:usageToken usage report
:doneStream finished successfully
:errorStream encountered an error
:unknownUnrecognized event (safe to skip)

Assembling the Full Text

full_text =
  res.stream
  |> LlmComposer.parse_stream_response(res.provider)
  |> Stream.filter(&(&1.type == :text_delta))
  |> Enum.map_join("", & &1.text)

Streaming and Retries

Streaming is not compatible with Tesla's retry middleware. When stream_response: true is set, the retry middleware is removed automatically. See the Configuration guide for retry options.