Usage & Billing

Overview

ReqLLM provides comprehensive usage tracking and cost calculation for all API requests. Every response includes normalized usage data that works consistently across providers, with detailed breakdowns for tokens, tools, and images.

The Usage Structure

Every ReqLLM.Response includes a usage map with normalized metrics:

{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello")

response.usage
#=> %{
#     # Token counts
#     input_tokens: 8,
#     output_tokens: 12,
#     total_tokens: 20,
#
#     # Cost summary (USD)
#     input_cost: 0.00024,
#     output_cost: 0.00036,
#     total_cost: 0.0006,
#
#     # Detailed cost breakdown
#     cost: %{
#       tokens: 0.0006,
#       tools: 0.0,
#       images: 0.0,
#       total: 0.0006
#     }
#   }

Token Usage

Standard Tokens

All providers report basic token counts:

Field	Description
`input_tokens`	Tokens in the request (prompt, context, tools)
`output_tokens`	Tokens generated by the model
`total_tokens`	Sum of input and output tokens

Reasoning Tokens

For reasoning models (OpenAI o1/o3/gpt-5, Anthropic extended thinking, Google thinking):

{:ok, response} = ReqLLM.generate_text("openai:o3-mini", prompt)

response.usage.reasoning_tokens
#=> 1250  # Tokens used for internal reasoning

The reasoning_tokens field tracks tokens used for chain-of-thought reasoning. These may be billed differently than standard tokens depending on the provider.

Cached Tokens

For providers that support prompt caching (Anthropic, OpenAI):

response.usage.cached_tokens
#=> 500  # Input tokens served from cache

response.usage.cache_creation_tokens
#=> 0    # Tokens used to create new cache entries

Cached tokens are typically billed at a reduced rate. See Anthropic Prompt Caching for details.

Tool Usage

When using tools like web search, usage is tracked in tool_usage:

response.usage.tool_usage
#=> %{
#     web_search: %{count: 2, unit: "call"}
#   }

Web Search

Each provider has slightly different web search tracking:

Provider	Unit	Notes
Anthropic	`"call"`	$10 per 1,000 searches
OpenAI	`"call"`	Responses API models only
xAI	`"call"` or `"source"`	Varies by response format
Google	`"query"`	Grounding queries

Anthropic Example:

{:ok, response} = ReqLLM.generate_text(
  "anthropic:claude-sonnet-4-5",
  "What's happening in AI today?",
  provider_options: [web_search: %{max_uses: 5}]
)

response.usage.tool_usage.web_search
#=> %{count: 3, unit: "call"}

xAI Example:

{:ok, response} = ReqLLM.generate_text(
  "xai:grok-4-1-fast-reasoning",
  "Latest tech news",
  xai_tools: [%{type: "web_search"}]
)

response.usage.tool_usage.web_search
#=> %{count: 5, unit: "call"}

Google Grounding Example:

{:ok, response} = ReqLLM.generate_text(
  "google:gemini-3-flash-preview",
  "Current stock market trends",
  provider_options: [google_grounding: %{enable: true}]
)

response.usage.tool_usage.web_search
#=> %{count: 2, unit: "query"}

Image Usage

For image generation, usage is tracked in image_usage:

{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt)

response.usage.image_usage
#=> %{
#     generated: %{count: 1, size_class: "1024x1024"}
#   }

Size Classes

Image costs vary by resolution:

Provider	Size Classes
OpenAI GPT Image	`"1024x1024"`, `"1536x1024"`, `"1024x1536"`, `"auto"`
OpenAI DALL-E 3	`"1024x1024"`, `"1792x1024"`, `"1024x1792"`
Google	Based on aspect ratio

Multiple Images

{:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3)

response.usage.image_usage.generated
#=> %{count: 3, size_class: "1024x1024"}

Cost Breakdown

The cost map provides a detailed breakdown by category:

response.usage.cost
#=> %{
#     tokens: 0.001,    # Token-based costs (input + output)
#     tools: 0.02,      # Web search and tool costs
#     images: 0.04,     # Image generation costs
#     total: 0.061,     # Sum of all costs
#     line_items: [...]  # Per-component details
#   }

Line Items

For detailed billing analysis, line_items provides per-component costs:

response.usage.cost.line_items
#=> [
#     %{component: "token.input", cost: 0.0003, quantity: 100},
#     %{component: "token.output", cost: 0.0007, quantity: 50},
#     %{component: "tool.web_search", cost: 0.02, quantity: 2}
#   ]

Provider-Specific Notes

Anthropic

Web search: $10 per 1,000 searches
Prompt caching: Reduced rates for cached tokens
Extended thinking: Reasoning tokens tracked separately

OpenAI

Responses API: Web search available for o1, o3, gpt-5 models
Chat Completions API: No built-in web search
Image generation: Costs vary by model and size

xAI

Web search: Via xai_tools option
Deprecated: live_search is no longer supported
Units: May report as "call" or "source"

Google

Grounding: Search via google_grounding option
Units: Reports as "query"
Image generation: Gemini image models supported

Telemetry

A telemetry event is published on every request:

:telemetry.attach(
  "my-usage-handler",
  [:req_llm, :token_usage],
  fn _event, measurements, metadata, _config ->
    IO.inspect(measurements, label: "Usage")
    IO.inspect(metadata, label: "Metadata")
  end,
  nil
)

Event measurements include:

input_tokens, output_tokens, total_tokens
input_cost, output_cost, total_cost
reasoning_tokens (when applicable)

Example: Complete Usage Tracking

defmodule UsageTracker do
  def track_request(model, prompt, opts \\ []) do
    {duration_us, result} = :timer.tc(fn ->
      ReqLLM.generate_text(model, prompt, opts)
    end)

    case result do
      {:ok, response} ->
        usage = response.usage

        IO.puts("""
        Request completed in #{duration_us / 1000}ms

        Tokens:
          Input: #{usage.input_tokens}
          Output: #{usage.output_tokens}
          Total: #{usage.total_tokens}
          #{if usage.reasoning_tokens, do: "Reasoning: #{usage.reasoning_tokens}", else: ""}

        Cost:
          Input: $#{format_cost(usage.input_cost)}
          Output: $#{format_cost(usage.output_cost)}
          Total: $#{format_cost(usage.total_cost)}

        #{format_tool_usage(usage.tool_usage)}
        #{format_image_usage(usage.image_usage)}
        """)

        {:ok, response}

      error ->
        error
    end
  end

  defp format_cost(nil), do: "n/a"
  defp format_cost(cost), do: :erlang.float_to_binary(cost, decimals: 6)

  defp format_tool_usage(nil), do: ""
  defp format_tool_usage(tool_usage) do
    Enum.map_join(tool_usage, "\n", fn {tool, %{count: count, unit: unit}} ->
      "Tool Usage: #{tool} = #{count} #{unit}(s)"
    end)
  end

  defp format_image_usage(nil), do: ""
  defp format_image_usage(%{generated: %{count: count, size_class: size}}) do
    "Image Usage: #{count} image(s) at #{size}"
  end
  defp format_image_usage(_), do: ""
end