Pricing and Cost Tracking
Copy MarkdownSycophant automatically calculates per-request costs using model pricing data
from LLMDB. Costs are attached to the Usage struct on every response.
How It Works
After each request, the pipeline:
- Reads the model's pricing metadata from LLMDB
- Matches token counts against pricing components by ID
- Computes costs using each component's rate and unit size
- Attaches the results (and the full pricing reference) to
response.usage
No configuration is needed. If LLMDB has pricing data for the model, costs are calculated automatically.
Reading Costs
{:ok, response} = Sycophant.generate_text("anthropic:claude-sonnet-4-20250514",
messages: [%{role: "user", content: "Hello"}]
)
response.usage.input_cost #=> 0.003
response.usage.output_cost #=> 0.015
response.usage.total_cost #=> 0.018All cost fields are in USD. They are nil when the model has no pricing data
or when the corresponding token count is nil.
Cost Fields
| Field | Description |
|---|---|
input_cost | Cost of input/prompt tokens |
output_cost | Cost of output/completion tokens |
cache_read_cost | Cost of tokens read from provider cache |
cache_write_cost | Cost of tokens written to provider cache |
reasoning_cost | Cost of reasoning tokens (o-series, thinking models) |
total_cost | Sum of all non-nil cost components |
Token Fields
| Field | Description |
|---|---|
input_tokens | Tokens in the prompt |
output_tokens | Tokens in the completion |
cache_creation_input_tokens | Tokens written to provider cache |
cache_read_input_tokens | Tokens read from provider cache |
reasoning_tokens | Internal reasoning tokens (o-series, thinking models) |
Pricing Reference
The full pricing metadata from LLMDB is attached to response.usage.pricing.
This includes all pricing components for the model, not just the token ones
used for automatic cost calculation.
response.usage.pricing
#=> %Sycophant.Pricing{
# currency: "USD",
# components: [
# %Sycophant.Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...},
# %Sycophant.Pricing.Component{id: "token.output", kind: "token", per: 1000000, rate: 15.0, ...},
# %Sycophant.Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, ...},
# ...
# ]
# }Component Kinds
LLMDB pricing components fall into four kinds:
| Kind | Description | Examples |
|---|---|---|
"token" | Per-token rates | token.input, token.output, token.cache_read, token.reasoning |
"tool" | Per-call rates for built-in tools | tool.web_search, tool.file_search, tool.code_interpreter |
"image" | Per-image rates by size/quality | image.1024x1024, image.generated |
"storage" | Per-unit storage rates | storage.file_search |
Only "token" components are used for automatic cost calculation. Tool, image,
and storage components are attached as reference data for callers who need them.
Looking Up Components
alias Sycophant.Pricing
pricing = response.usage.pricing
Pricing.find_component(pricing, "token.input")
#=> %Pricing.Component{id: "token.input", kind: "token", per: 1000000, rate: 3.0, ...}
Pricing.find_component(pricing, "tool.web_search")
#=> %Pricing.Component{id: "tool.web_search", kind: "tool", per: 1000, rate: 10.0, tool: "web_search", ...}Reasoning Tokens
Models with reasoning capabilities (OpenAI o-series, Google Gemini with thinking) report reasoning tokens separately. These are internal tokens the model uses for chain-of-thought processing.
{:ok, response} = Sycophant.generate_text("openai:o3",
messages: [%{role: "user", content: "Solve this step by step: ..."}]
)
response.usage.reasoning_tokens #=> 1250
response.usage.reasoning_cost #=> 0.000625Reasoning tokens are extracted from:
- OpenAI Completions:
completion_tokens_details.reasoning_tokens - OpenAI Responses:
output_tokens_details.reasoning_tokens - Google Gemini:
usageMetadata.thoughtsTokenCount
Anthropic and Bedrock do not report reasoning tokens separately.
Agent Cost Tracking
When using agent mode, Agent.Stats accumulates costs across turns:
{:ok, result} = Sycophant.Agent.run(agent)
result.stats.total_cost #=> 0.0042
result.stats.total_input_tokens #=> 1250
result.stats.total_output_tokens #=> 340
result.stats.total_reasoning_tokens #=> 500
# Per-turn breakdown
for turn <- Sycophant.Agent.Stats.turns(result.stats) do
{turn.input_tokens, turn.output_tokens, turn.reasoning_tokens, turn.cost}
endTelemetry
Cost data is included in telemetry metadata on [:sycophant, :request, :stop]
events. The pricing struct is converted to plain maps for telemetry consumers.
:telemetry.attach("cost-tracker", [:sycophant, :request, :stop],
fn _event, _measurements, metadata, _config ->
usage = metadata.usage
Logger.info("Request cost: $#{usage[:total_cost]}")
end,
nil
)See the Telemetry guide for the full list of usage fields in telemetry metadata.
Serialization
Pricing and Pricing.Component structs implement the Sycophant.Serializable
protocol, so they round-trip through JSON alongside Usage:
alias Sycophant.Serializable.Decoder
json = Decoder.encode(response.usage)
restored = Decoder.decode(json)
restored.pricing.currency #=> "USD"See the Serialization guide for details.