Telemetry

View Source

ReqLLM emits native :telemetry events for both Req-backed requests and Finch-backed streaming. Every event for a logical request shares the same request_id, so you can correlate request lifecycle, reasoning lifecycle, and token usage without provider-specific parsing.

Use these events for billing, tenant attribution, latency tracking, reasoning observability, and low-level integrations that cannot rely on wrapping Req directly.

Event Families

  • [:req_llm, :request, :start] fires once when a request begins.
  • [:req_llm, :request, :stop] fires once when a request completes, including streaming completion and cancellation.
  • [:req_llm, :request, :exception] fires once when a request fails.
  • [:req_llm, :reasoning, :start] fires when the effective request enables provider reasoning.
  • [:req_llm, :reasoning, :update] fires on reasoning milestones, not every chunk.
  • [:req_llm, :reasoning, :stop] fires when a reasoning request finishes, is cancelled, or errors.
  • [:req_llm, :token_usage] remains as a compatibility event for token and cost tracking.

Request lifecycle events always include a reasoning map in metadata, even for operations that do not support reasoning. In those cases, the snapshot is explicit about reasoning being disabled or unsupported.

Measurements

  • request.start, reasoning.start, and reasoning.update emit %{system_time: integer}.
  • request.stop, request.exception, and reasoning.stop emit %{duration: integer, system_time: integer}.

duration is in native monotonic time units and should be converted with System.convert_time_unit/3 if you want milliseconds.

Request Metadata

Every request lifecycle event includes these core metadata fields:

  • request_id
  • operation
  • mode
  • provider
  • model
  • transport
  • reasoning
  • request_summary
  • response_summary
  • http_status
  • finish_reason
  • usage

When payload capture is enabled, request lifecycle events also include request_payload and response_payload.

Typical request metadata looks like this:

%{
  request_id: "2184",
  operation: :chat,
  mode: :stream,
  provider: :anthropic,
  model: %LLMDB.Model{},
  transport: :finch,
  reasoning: %{
    supported?: true,
    requested?: true,
    effective?: true,
    requested_mode: :enabled,
    requested_effort: :medium,
    requested_budget_tokens: 4096,
    effective_mode: :enabled,
    effective_effort: :medium,
    effective_budget_tokens: 4096,
    returned_content?: true,
    reasoning_tokens: 812,
    content_bytes: 1432,
    channel: :content_and_usage
  },
  request_summary: %{
    message_count: 1,
    text_bytes: 42,
    image_part_count: 0,
    tool_call_count: 0
  },
  response_summary: %{
    text_bytes: 318,
    thinking_bytes: 1432,
    tool_call_count: 0,
    image_count: 0,
    object?: false
  },
  http_status: 200,
  finish_reason: :stop,
  usage: %{
    input_tokens: 24,
    output_tokens: 133,
    total_tokens: 157,
    reasoning_tokens: 812
  }
}

request_summary and response_summary are compact by design. Their exact shape varies by operation:

  • Chat, object, and image requests summarize message count, text bytes, image parts, and tool calls.
  • Chat, object, and image responses summarize output text bytes, thinking bytes, tool calls, image count, and structured object presence.
  • Embeddings summarize input count, vector count, and dimensions.
  • Speech summarizes input text bytes and output audio size and format.
  • Transcription summarizes input audio size plus transcript text bytes, segment count, and duration.

Standardized Reasoning Metadata

The reasoning map is the provider-neutral contract for reasoning and thinking observability:

  • supported? says whether the operation and model support reasoning.
  • requested? reflects the original API options passed to ReqLLM.
  • effective? reflects the translated provider request after normalization.
  • requested_mode, requested_effort, and requested_budget_tokens capture the caller intent.
  • effective_mode, effective_effort, and effective_budget_tokens capture what the provider request actually used.
  • returned_content? indicates whether reasoning content was observed.
  • reasoning_tokens tracks normalized reasoning token usage when providers expose it.
  • content_bytes tracks the amount of reasoning content observed without exposing the content itself.
  • channel is one of :none, :usage_only, :content_only, or :content_and_usage.

Requested reasoning is normalized from the original ReqLLM options, such as:

  • reasoning_effort
  • thinking: %{type: "enabled", budget_tokens: ...}
  • provider_options: [google_thinking_budget: ...]
  • provider-specific reasoning budget and thinking toggles

Effective reasoning is normalized from the translated provider request so that OpenAI, Anthropic, Google, Vertex, and other providers can be compared through the same telemetry shape.

The normalizer currently covers these provider request shapes:

  • OpenAI-style reasoning effort fields such as reasoning.effort and reasoning_effort on OpenAI, OpenRouter, Groq, and xAI
  • Anthropic-style thinking fields such as thinking and additional_model_request_fields.thinking on Anthropic, Azure Claude, Bedrock Claude, and Vertex Claude
  • Google-style thinking budgets such as google_thinking_budget and generationConfig.thinkingConfig.thinkingBudget on Google Gemini and Vertex Gemini
  • Alibaba enable_thinking and thinking_budget
  • Zenmux reasoning.enable, reasoning.depth, and reasoning_effort
  • Z.AI thinking.type

Because requested is derived from the original ReqLLM call and effective is derived from the translated provider request, they can diverge when provider translation drops, disables, or rewrites a reasoning configuration.

When callers send conflicting reasoning controls, ReqLLM telemetry resolves them conservatively. Explicit disable signals such as thinking: %{type: "disabled"}, reasoning_effort: :none, or zero-token budgets win over enable hints in the normalized requested snapshot.

Reasoning Milestones

Reasoning events never include raw thinking text. They are metadata-only, even when payload capture is enabled.

reasoning.update is emitted only for milestone transitions:

  • milestone: :content_started when the first reasoning content is observed
  • milestone: :usage_updated when reasoning token usage first appears or changes
  • milestone: :details_available when provider reasoning details become available

reasoning.start uses milestone: :request_started.

reasoning.stop uses the terminal outcome as its milestone, for example:

  • :stop
  • :length
  • :tool_calls
  • :cancelled
  • :incomplete
  • :error
  • :unknown

Token Usage Compatibility Event

[:req_llm, :token_usage] remains available for existing consumers and now fires for streaming as well as non-streaming requests.

Measurements include:

  • input_tokens
  • output_tokens
  • total_tokens
  • input_cost
  • output_cost
  • total_cost
  • reasoning_tokens

Metadata includes:

  • model
  • request_id
  • operation
  • mode
  • provider
  • transport

For new integrations, prefer [:req_llm, :request, :stop] as the source of truth because it includes duration, finish reason, summaries, and normalized reasoning metadata alongside usage.

Attaching Telemetry Handlers

defmodule MyApp.ReqLLMObserver do
  require Logger

  @events [
    [:req_llm, :request, :start],
    [:req_llm, :request, :stop],
    [:req_llm, :request, :exception],
    [:req_llm, :reasoning, :start],
    [:req_llm, :reasoning, :update],
    [:req_llm, :reasoning, :stop],
    [:req_llm, :token_usage]
  ]

  def attach do
    :telemetry.attach_many("my-app-req-llm", @events, &__MODULE__.handle_event/4, nil)
  end

  def handle_event([:req_llm, :request, :stop], %{duration: duration}, metadata, _config) do
    duration_ms = System.convert_time_unit(duration, :native, :millisecond)

    Logger.info(
      "req_llm request=#{metadata.request_id} model=#{metadata.model.provider}:#{metadata.model.id} " <>
        "duration_ms=#{duration_ms} finish_reason=#{inspect(metadata.finish_reason)} " <>
        "total_tokens=#{metadata.usage && metadata.usage.total_tokens}"
    )
  end

  def handle_event([:req_llm, :reasoning, :update], _measurements, metadata, _config) do
    Logger.debug(
      "req_llm reasoning request=#{metadata.request_id} milestone=#{inspect(metadata.milestone)} " <>
        "channel=#{inspect(metadata.reasoning.channel)} tokens=#{metadata.reasoning.reasoning_tokens}"
    )
  end

  def handle_event([:req_llm, :token_usage], measurements, metadata, _config) do
    Logger.info(
      "req_llm usage request=#{metadata.request_id} total_tokens=#{measurements.total_tokens} " <>
        "total_cost=#{measurements.total_cost}"
    )
  end

  def handle_event(_event, _measurements, _metadata, _config), do: :ok
end

Payload Capture

By default, ReqLLM telemetry is metadata-only:

config :req_llm, telemetry: [payloads: :none]

You can opt into payload capture globally:

config :req_llm, telemetry: [payloads: :raw]

Or per request:

ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello", telemetry: [payloads: :raw])

ReqLLM.stream_text("openai:gpt-5-mini", "Hello", telemetry: [payloads: :raw])

Payload mode only affects request lifecycle events. Reasoning events stay metadata-only.

Raw payload mode is still sanitized:

  • reasoning and thinking text is redacted from payloads
  • tools are emitted as stable metadata only (name, description, strict, parameter_schema)
  • binary message parts such as images and files are summarized by byte size, media type, and filename instead of emitting raw bytes
  • unknown payload shapes are recursively sanitized so opaque binaries are summarized instead of passed through
  • speech telemetry reports audio size and format, not raw audio bytes
  • embedding telemetry reports vector counts and dimensions, not the vectors themselves
  • transcription telemetry stays structured and avoids opaque binary payloads

Use raw payload capture carefully in multi-tenant systems because request and response payloads may still contain user content, tool call arguments, and structured outputs.

OpenTelemetry Bridge

ReqLLM also includes a small OpenTelemetry bridge in ReqLLM.OpenTelemetry. It turns the normalized request lifecycle telemetry above into GenAI client spans without adding provider-specific instrumentation paths.

Attach it once during application startup:

case ReqLLM.OpenTelemetry.attach() do
  :ok -> :ok
  {:error, :opentelemetry_unavailable} -> :ok
end

The bridge uses:

  • gen_ai.provider.name
  • gen_ai.operation.name
  • gen_ai.request.model
  • gen_ai.output.type
  • gen_ai.response.finish_reasons
  • gen_ai.usage.input_tokens
  • gen_ai.usage.output_tokens
  • cache read and cache creation token attributes when available
  • error.type for failed requests

ReqLLM does not configure an SDK or exporter for you. To export traces, your host application still needs normal OpenTelemetry setup, such as :opentelemetry and an exporter dependency.

For advanced integrations, ReqLLM also exposes a dependency-free mapper in ReqLLM.Telemetry.OpenTelemetry. It builds span stubs from ReqLLM telemetry metadata without attaching handlers or depending on an OpenTelemetry SDK.

defmodule MyApp.ReqLLMOpenTelemetry do
  alias ReqLLM.Telemetry.OpenTelemetry

  @events [
    [:req_llm, :request, :start],
    [:req_llm, :request, :stop],
    [:req_llm, :request, :exception]
  ]

  def attach do
    :telemetry.attach_many("my-app-req-llm-otel", @events, &__MODULE__.handle_event/4, %{})
  end

  def handle_event([:req_llm, :request, :start], _measurements, metadata, _config) do
    stub = OpenTelemetry.request_start(metadata, content: :attributes)
    MyApp.Tracing.start_gen_ai_span(metadata.request_id, stub)
  end

  def handle_event([:req_llm, :request, :stop], _measurements, metadata, _config) do
    stub = OpenTelemetry.request_stop(metadata, content: :attributes)
    MyApp.Tracing.finish_gen_ai_span(metadata.request_id, stub)
  end

  def handle_event([:req_llm, :request, :exception], _measurements, metadata, _config) do
    stub = OpenTelemetry.request_exception(metadata, content: :attributes)
    MyApp.Tracing.finish_gen_ai_span(metadata.request_id, stub)
  end
end

The low-level mapper includes richer normalized GenAI metadata such as:

  • gen_ai.response.id
  • gen_ai.response.model
  • gen_ai.input.messages and gen_ai.output.messages when content capture is enabled
  • tool call and tool result payloads in message parts
  • exception event payloads for manual span finishing

Coverage Across APIs

These event families are emitted for:

If you need observability that covers both sync and streaming, attach to ReqLLM telemetry rather than Req middleware alone.

See Also