Pricing and Cost Tracking

Copy Markdown

Sycophant automatically calculates per-request costs using model pricing data from LLMDB. Costs are attached to the Usage struct on every response.

How It Works

After each request, the pipeline:

  1. Reads the model's pricing metadata from LLMDB
  2. Matches token counts against pricing components by ID
  3. Computes costs using each component's rate and unit size
  4. Attaches the results (and the full pricing reference) to response.usage

No configuration is needed. If LLMDB has pricing data for the model, costs are calculated automatically.

Reading Costs

{:ok, response} = Sycophant.generate_text("anthropic:claude-sonnet-4-20250514",
  messages: [%{role: "user", content: "Hello"}]
)

response.usage.input_cost      #=> 0.003
response.usage.output_cost     #=> 0.015
response.usage.total_cost      #=> 0.018

All cost fields are in USD. They are nil when the model has no pricing data or when the corresponding token count is nil.

Cost Fields

FieldDescription
input_costCost of input/prompt tokens
output_costCost of output/completion tokens
cache_read_costCost of tokens read from provider cache
cache_write_costCost of tokens written to provider cache
reasoning_costCost of reasoning tokens (o-series, thinking models)
total_costSum of all non-nil cost components

Token Fields

FieldDescription
input_tokensTokens in the prompt
output_tokensTokens in the completion
cache_creation_input_tokensTokens written to provider cache
cache_read_input_tokensTokens read from provider cache
reasoning_tokensInternal reasoning tokens (o-series, thinking models)

Pricing Reference

The full pricing metadata from LLMDB is attached to response.usage.pricing. This includes all pricing components for the model, not just the token ones used for automatic cost calculation.

response.usage.pricing
#=> %Sycophant.Pricing{
#     currency: "USD",
#     components: [
#       %Sycophant.Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...},
#       %Sycophant.Pricing.Component{id: "token.output", kind: "token", per: 1000000, rate: 15.0, ...},
#       %Sycophant.Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, ...},
#       ...
#     ]
#   }

Component Kinds

LLMDB pricing components fall into four kinds:

KindDescriptionExamples
"token"Per-token ratestoken.input, token.output, token.cache_read, token.reasoning
"tool"Per-call rates for built-in toolstool.web_search, tool.file_search, tool.code_interpreter
"image"Per-image rates by size/qualityimage.1024x1024, image.generated
"storage"Per-unit storage ratesstorage.file_search

Only "token" components are used for automatic cost calculation. Tool, image, and storage components are attached as reference data for callers who need them.

Looking Up Components

alias Sycophant.Pricing

pricing = response.usage.pricing
Pricing.find_component(pricing, "token.input")
#=> %Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...}

Pricing.find_component(pricing, "tool.web_search")
#=> %Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, tool: "web_search", ...}

Reasoning Tokens

Models with reasoning capabilities (OpenAI o-series, Google Gemini with thinking) report reasoning tokens separately. These are internal tokens the model uses for chain-of-thought processing.

{:ok, response} = Sycophant.generate_text("openai:o3",
  messages: [%{role: "user", content: "Solve this step by step: ..."}]
)

response.usage.reasoning_tokens  #=> 1250
response.usage.reasoning_cost    #=> 0.000625

Reasoning tokens are extracted from:

  • OpenAI Completions: completion_tokens_details.reasoning_tokens
  • OpenAI Responses: output_tokens_details.reasoning_tokens
  • Google Gemini: usageMetadata.thoughtsTokenCount

Anthropic and Bedrock do not report reasoning tokens separately.

Agent Cost Tracking

When using agent mode, Agent.Stats accumulates costs across turns:

{:ok, result} = Sycophant.Agent.run(agent)

result.stats.total_cost              #=> 0.0042
result.stats.total_input_tokens      #=> 1250
result.stats.total_output_tokens     #=> 340
result.stats.total_reasoning_tokens  #=> 500

# Per-turn breakdown
for turn <- Sycophant.Agent.Stats.turns(result.stats) do
  {turn.input_tokens, turn.output_tokens, turn.reasoning_tokens, turn.cost}
end

Telemetry

Cost data is included in telemetry metadata on [:sycophant, :request, :stop] events. The pricing struct is converted to plain maps for telemetry consumers.

:telemetry.attach("cost-tracker", [:sycophant, :request, :stop],
  fn _event, _measurements, metadata, _config ->
    usage = metadata.usage
    Logger.info("Request cost: $#{usage[:total_cost]}")
  end,
  nil
)

See the Telemetry guide for the full list of usage fields in telemetry metadata.

Serialization

Pricing and Pricing.Component structs implement the Sycophant.Serializable protocol, so they round-trip through JSON alongside Usage:

alias Sycophant.Serializable.Decoder

json = Decoder.encode(response.usage)
restored = Decoder.decode(json)
restored.pricing.currency  #=> "USD"

See the Serialization guide for details.