Usage & Billing

View Source

Overview

ReqLLM provides normalized usage tracking and best-effort cost calculation for API requests. Every response includes usage data that works consistently across providers, with detailed breakdowns for tokens, tools, and images when the provider exposes enough information.

Pricing Policy

ReqLLM currently targets "some assistance, no guarantees" for pricing.

In practice, that means:

  • response.usage is intended to be useful for product analytics, tenant attribution, dashboards, and rough billing estimates
  • token, tool, image, and caching costs are calculated from provider usage data plus model pricing metadata when those inputs exist
  • the resulting USD totals are not guaranteed to match provider invoices exactly

When exact billing matters, treat ReqLLM usage as a helpful estimate and reconcile against provider-side reporting. For the full contract, known gaps, and production guidance, see the Pricing Policy guide.

The Usage Structure

Every ReqLLM.Response includes a usage map with normalized metrics:

{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello")

response.usage
#=> %{
#     # Token counts
#     input_tokens: 8,
#     output_tokens: 12,
#     total_tokens: 20,
#
#     # Cost summary (USD)
#     input_cost: 0.00024,
#     output_cost: 0.00036,
#     total_cost: 0.0006,
#
#     # Detailed cost breakdown
#     cost: %{
#       tokens: 0.0006,
#       tools: 0.0,
#       images: 0.0,
#       total: 0.0006
#     }
#   }

Token Usage

Standard Tokens

All providers report basic token counts:

FieldDescription
input_tokensTokens in the request (prompt, context, tools)
output_tokensTokens generated by the model
total_tokensSum of input and output tokens

Reasoning Tokens

For reasoning models (OpenAI o1/o3/gpt-5, Anthropic extended thinking, Google thinking):

{:ok, response} = ReqLLM.generate_text("openai:o3-mini", prompt)

response.usage.reasoning_tokens
#=> 1250  # Tokens used for internal reasoning

The reasoning_tokens field tracks tokens used for chain-of-thought reasoning. These may be billed differently than standard tokens depending on the provider.

Cached Tokens

For providers that support prompt caching (Anthropic, OpenAI):

response.usage.cached_tokens
#=> 500  # Input tokens served from cache

response.usage.cache_creation_tokens
#=> 0    # Tokens used to create new cache entries

Cached tokens are typically billed at a reduced rate. See Anthropic Prompt Caching for details.

Tool Usage

When using tools like web search, usage is tracked in tool_usage:

response.usage.tool_usage
#=> %{
#     web_search: %{count: 2, unit: "call"}
#   }

Each provider has slightly different web search tracking:

ProviderUnitNotes
Anthropic"call"$10 per 1,000 searches
OpenAI"call"Responses API models only
xAI"call" or "source"Varies by response format
Google"query"Grounding queries

Anthropic Example:

{:ok, response} = ReqLLM.generate_text(
  "anthropic:claude-sonnet-4-5",
  "What's happening in AI today?",
  provider_options: [web_search: %{max_uses: 5}]
)

response.usage.tool_usage.web_search
#=> %{count: 3, unit: "call"}

xAI Example:

{:ok, response} = ReqLLM.generate_text(
  "xai:grok-4-1-fast-reasoning",
  "Latest tech news",
  xai_tools: [%{type: "web_search"}]
)

response.usage.tool_usage.web_search
#=> %{count: 5, unit: "call"}

Google Grounding Example:

{:ok, response} = ReqLLM.generate_text(
  "google:gemini-3-flash-preview",
  "Current stock market trends",
  provider_options: [google_grounding: %{enable: true}]
)

response.usage.tool_usage.web_search
#=> %{count: 2, unit: "query"}

Image Usage

For image generation, usage is tracked in image_usage:

{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt)

response.usage.image_usage
#=> %{
#     generated: %{count: 1, size_class: "1024x1024"}
#   }

Size Classes

Image costs vary by resolution:

ProviderSize Classes
OpenAI GPT Image"1024x1024", "1536x1024", "1024x1536", "auto"
OpenAI DALL-E 3"1024x1024", "1792x1024", "1024x1792"
GoogleBased on aspect ratio

Multiple Images

{:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3)

response.usage.image_usage.generated
#=> %{count: 3, size_class: "1024x1024"}

Cost Breakdown

The cost map provides a detailed breakdown by category:

response.usage.cost
#=> %{
#     tokens: 0.001,    # Token-based costs (input + output)
#     tools: 0.02,      # Web search and tool costs
#     images: 0.04,     # Image generation costs
#     total: 0.061,     # Sum of all costs
#     line_items: [...]  # Per-component details
#   }

Line Items

For detailed billing analysis, line_items provides per-component costs:

response.usage.cost.line_items
#=> [
#     %{component: "token.input", cost: 0.0003, quantity: 100},
#     %{component: "token.output", cost: 0.0007, quantity: 50},
#     %{component: "tool.web_search", cost: 0.02, quantity: 2}
#   ]

Provider-Specific Notes

Anthropic

  • Web search: $10 per 1,000 searches
  • Prompt caching: Reduced rates for cached tokens
  • Extended thinking: Reasoning tokens tracked separately

OpenAI

  • Responses API: Web search available for o1, o3, gpt-5 models
  • Chat Completions API: No built-in web search
  • Image generation: Costs vary by model and size

xAI

  • Web search: Via xai_tools option
  • Deprecated: live_search is no longer supported
  • Units: May report as "call" or "source"

Google

  • Grounding: Search via google_grounding option
  • Units: Reports as "query"
  • Image generation: Gemini image models supported

Known Limits

ReqLLM does not currently guarantee support for every provider billing surface. In particular:

  • realtime audio/text billing is not modeled yet
  • video generation billing is not modeled yet
  • account-specific discounts, credits, taxes, and regional pricing are outside the public contract

Telemetry

ReqLLM now emits three telemetry families:

  • [:req_llm, :request, :start | :stop | :exception] for lifecycle timing, request and response summaries, usage, and standardized reasoning metadata

  • [:req_llm, :reasoning, :start | :update | :stop] for provider-neutral thinking and reasoning milestones

  • [:req_llm, :token_usage] for backwards-compatible token and cost tracking

For billing and tenant attribution, use [:req_llm, :request, :stop] as the source of truth. It includes duration in measurements plus request_id, usage, finish_reason, and normalized reasoning metadata in the event metadata. The token usage event remains useful if you only want token and cost totals.

When you audit reasoning-heavy workloads, prefer the normalized reasoning snapshot on the request lifecycle events over raw provider payloads. It captures both the originally requested reasoning settings and the effective translated request, so you can see when a provider rewrites or disables a reasoning configuration before you attribute cost or behavior to a tenant.

:telemetry.attach_many(
  "my-req-llm-billing",
  [
    [:req_llm, :request, :stop],
    [:req_llm, :request, :exception],
    [:req_llm, :token_usage]
  ],
  fn event, measurements, metadata, _config ->
    case event do
      [:req_llm, :request, :stop] ->
        duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)

        IO.inspect(
          %{
            request_id: metadata.request_id,
            duration_ms: duration_ms,
            finish_reason: metadata.finish_reason,
            usage: metadata.usage,
            reasoning: metadata.reasoning
          },
          label: "Request"
        )

      [:req_llm, :request, :exception] ->
        IO.inspect(metadata, label: "Failed request")

      [:req_llm, :token_usage] ->
        IO.inspect(%{measurements: measurements, metadata: metadata}, label: "Usage")
    end
  end,
  nil
)

[:req_llm, :token_usage] remains available on every request, including streaming:

:telemetry.attach(
  "my-usage-handler",
  [:req_llm, :token_usage],
  fn _event, measurements, metadata, _config ->
    IO.inspect(measurements, label: "Usage")
    IO.inspect(metadata, label: "Metadata")
  end,
  nil
)

Event measurements include:

  • input_tokens, output_tokens, total_tokens
  • input_cost, output_cost, total_cost
  • reasoning_tokens (when applicable)

See the Telemetry Guide for the full event contract, reasoning lifecycle, milestone semantics, and payload capture options.

Example: Complete Usage Tracking

defmodule UsageTracker do
  def track_request(model, prompt, opts \\ []) do
    {duration_us, result} = :timer.tc(fn ->
      ReqLLM.generate_text(model, prompt, opts)
    end)

    case result do
      {:ok, response} ->
        usage = response.usage

        IO.puts("""
        Request completed in #{duration_us / 1000}ms

        Tokens:
          Input: #{usage.input_tokens}
          Output: #{usage.output_tokens}
          Total: #{usage.total_tokens}
          #{if usage.reasoning_tokens, do: "Reasoning: #{usage.reasoning_tokens}", else: ""}

        Cost:
          Input: $#{format_cost(usage.input_cost)}
          Output: $#{format_cost(usage.output_cost)}
          Total: $#{format_cost(usage.total_cost)}

        #{format_tool_usage(usage.tool_usage)}
        #{format_image_usage(usage.image_usage)}
        """)

        {:ok, response}

      error ->
        error
    end
  end

  defp format_cost(nil), do: "n/a"
  defp format_cost(cost), do: :erlang.float_to_binary(cost, decimals: 6)

  defp format_tool_usage(nil), do: ""
  defp format_tool_usage(tool_usage) do
    Enum.map_join(tool_usage, "\n", fn {tool, %{count: count, unit: unit}} ->
      "Tool Usage: #{tool} = #{count} #{unit}(s)"
    end)
  end

  defp format_image_usage(nil), do: ""
  defp format_image_usage(%{generated: %{count: count, size_class: size}}) do
    "Image Usage: #{count} image(s) at #{size}"
  end
  defp format_image_usage(_), do: ""
end

See Also