Pricing and Cost Tracking

Sycophant automatically calculates per-request costs using model pricing data from LLMDB. Costs are attached to the Usage struct on every response.

How It Works

After each request, the pipeline:

Reads the model's pricing metadata from LLMDB
Matches token counts against pricing components by ID
Computes costs using each component's rate and unit size
Attaches the results (and the full pricing reference) to response.usage

No configuration is needed. If LLMDB has pricing data for the model, costs are calculated automatically.

Reading Costs

{:ok, response} = Sycophant.generate_text("anthropic:claude-sonnet-4-20250514",
  messages: [%{role: "user", content: "Hello"}]
)

response.usage.input_cost      #=> 0.003
response.usage.output_cost     #=> 0.015
response.usage.total_cost      #=> 0.018

All cost fields are in USD. They are nil when the model has no pricing data or when the corresponding token count is nil.

Cost Fields

Field	Description
`input_cost`	Cost of input/prompt tokens
`output_cost`	Cost of output/completion tokens
`cache_read_cost`	Cost of tokens read from provider cache
`cache_write_cost`	Cost of tokens written to provider cache
`reasoning_cost`	Cost of reasoning tokens (o-series, thinking models)
`total_cost`	Sum of all non-nil cost components

Token Fields

Field	Description
`input_tokens`	Tokens in the prompt
`output_tokens`	Tokens in the completion
`cache_creation_input_tokens`	Tokens written to provider cache
`cache_read_input_tokens`	Tokens read from provider cache
`reasoning_tokens`	Internal reasoning tokens (o-series, thinking models)

Pricing Reference

The full pricing metadata from LLMDB is attached to response.usage.pricing. This includes all pricing components for the model, not just the token ones used for automatic cost calculation.

response.usage.pricing
#=> %Sycophant.Pricing{
#     currency: "USD",
#     components: [
#       %Sycophant.Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...},
#       %Sycophant.Pricing.Component{id: "token.output", kind: "token", per: 1000000, rate: 15.0, ...},
#       %Sycophant.Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, ...},
#       ...
#     ]
#   }

Component Kinds

LLMDB pricing components fall into four kinds:

Kind	Description	Examples
`"token"`	Per-token rates	`token.input`, `token.output`, `token.cache_read`, `token.reasoning`
`"tool"`	Per-call rates for built-in tools	`tool.web_search`, `tool.file_search`, `tool.code_interpreter`
`"image"`	Per-image rates by size/quality	`image.1024x1024`, `image.generated`
`"storage"`	Per-unit storage rates	`storage.file_search`

Only "token" components are used for automatic cost calculation. Tool, image, and storage components are attached as reference data for callers who need them.

Looking Up Components

alias Sycophant.Pricing

pricing = response.usage.pricing
Pricing.find_component(pricing, "token.input")
#=> %Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...}

Pricing.find_component(pricing, "tool.web_search")
#=> %Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, tool: "web_search", ...}

Reasoning Tokens

Models with reasoning capabilities (OpenAI o-series, Google Gemini with thinking) report reasoning tokens separately. These are internal tokens the model uses for chain-of-thought processing.

{:ok, response} = Sycophant.generate_text("openai:o3",
  messages: [%{role: "user", content: "Solve this step by step: ..."}]
)

response.usage.reasoning_tokens  #=> 1250
response.usage.reasoning_cost    #=> 0.000625

Reasoning tokens are extracted from:

OpenAI Completions: completion_tokens_details.reasoning_tokens
OpenAI Responses: output_tokens_details.reasoning_tokens
Google Gemini: usageMetadata.thoughtsTokenCount

Anthropic and Bedrock do not report reasoning tokens separately.

Agent Cost Tracking

When using agent mode, Agent.Stats accumulates costs across turns:

{:ok, result} = Sycophant.Agent.run(agent)

result.stats.total_cost              #=> 0.0042
result.stats.total_input_tokens      #=> 1250
result.stats.total_output_tokens     #=> 340
result.stats.total_reasoning_tokens  #=> 500

# Per-turn breakdown
for turn <- Sycophant.Agent.Stats.turns(result.stats) do
  {turn.input_tokens, turn.output_tokens, turn.reasoning_tokens, turn.cost}
end

Telemetry

Cost data is included in telemetry metadata on [:sycophant, :request, :stop] events. The pricing struct is converted to plain maps for telemetry consumers.

:telemetry.attach("cost-tracker", [:sycophant, :request, :stop],
  fn _event, _measurements, metadata, _config ->
    usage = metadata.usage
    Logger.info("Request cost: $#{usage[:total_cost]}")
  end,
  nil
)

See the Telemetry guide for the full list of usage fields in telemetry metadata.

Serialization

Pricing and Pricing.Component structs implement the Sycophant.Serializable protocol, so they round-trip through JSON alongside Usage:

alias Sycophant.Serializable.Decoder

json = Decoder.encode(response.usage)
restored = Decoder.decode(json)
restored.pricing.currency  #=> "USD"

See the Serialization guide for details.

← Previous Page Telemetry

Next Page → Agent Mode