View Source LangChain.ChatModels.ChatAnthropic (LangChain v0.4.1)
Module for interacting with Anthropic models.
Parses and validates inputs for making requests to Anthropic's messages API.
Converts responses into more specialized LangChain data structures.
Callbacks
See the set of available callbacks: LangChain.Chains.ChainCallbacks
Rate Limit API Response Headers
Anthropic returns rate limit information in the response headers. Those can be accessed using an LLM callback like this:
handler = %{
on_llm_ratelimit_info: fn _chain, headers ->
IO.inspect(headers)
end
}
%{llm: ChatAnthropic.new!(%{model: "..."})}
|> LLMChain.new!()
# ... add messages ...
|> LLMChain.add_callback(handler)
|> LLMChain.run()When a request is received, something similar to the following will be output to the console.
%{
"anthropic-ratelimit-requests-limit" => ["50"],
"anthropic-ratelimit-requests-remaining" => ["49"],
"anthropic-ratelimit-requests-reset" => ["2024-06-08T04:28:30Z"],
"anthropic-ratelimit-tokens-limit" => ["50000"],
"anthropic-ratelimit-tokens-remaining" => ["50000"],
"anthropic-ratelimit-tokens-reset" => ["2024-06-08T04:28:30Z"],
"request-id" => ["req_1234"]
}Token Usage
Anthropic returns token usage information as part of the response body. The
LangChain.TokenUsage is added to the metadata of the LangChain.Message
and LangChain.MessageDelta structs that are processed under the :usage
key.
%LangChain.MessageDelta{
content: [],
status: :incomplete,
index: nil,
role: :assistant,
tool_calls: nil,
metadata: %{
usage: %LangChain.TokenUsage{
input: 55,
output: 4,
raw: %{
"cache_creation_input_tokens" => 0,
"cache_read_input_tokens" => 0,
"input_tokens" => 55,
"output_tokens" => 4
}
}
}
}The TokenUsage data is accumulated for MessageDelta structs and the final usage information will be on the LangChain.Message.
Tool Choice
Anthropic supports forcing a tool to be used.
This is supported through the tool_choice options. It takes a plain Elixir map to provide the configuration.
By default, the LLM will choose a tool call if a tool is available and it determines it is needed. That's the "auto" mode.
Example
Force the LLM's response to make a tool call of the "get_weather" function.
ChatAnthropic.new(%{
model: "...",
tool_choice: %{"type" => "tool", "name" => "get_weather"}
})AWS Bedrock Support
Anthropic Claude is supported in AWS Bedrock.
To configure ChatAnthropic for use on AWS Bedrock:
Request Model Access to get access to the Anthropic models you intend to use.
Using your AWS Console, create an Access Key for your application.
Set the key values in your
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYENVs.Get the Model ID for the model you intend to use. Base Models
Refer to
LangChain.Utils.BedrockConfigfor setting up the Bedrock authentication credentials for your environment.Setup your ChatAnthropic similar to the following:
alias LangChain.ChatModels.ChatAnthropic
ChatAnthropic.new!(%{ model: "anthropic.claude-3-5-sonnet-20241022-v2:0", bedrock: BedrockConfig.from_application_env!() })
Thinking
Models like Claude 3.7 Sonnet introduced a hybrid approach which allows for "thinking" and reasoning. See the Anthropic thinking documentation for up-to-date instructions on the usage.
For instance, enabling thinking may require the temperature to be set to 1 and other settings like topP may not be allowed.
The model supports a :thinking attribute where the data is a map that matches the structure in the
Anthropic documentation. It is passed along as-is.
Example:
# Enable thinking and budget 2,000 tokens for the thinking space.
model = ChatAnthropic.new!(%{
model: "claude-3-7-sonnet-latest",
thinking: %{type: "enabled", budget_tokens: 2000}
})
# Disable thinking
model = ChatAnthropic.new!(%{
model: "claude-3-7-sonnet-latest",
thinking: %{type: "disabled"}
})As of the documentation for Claude 3.7 Sonnet, the minimum budget for thinking is 1024 tokens.
Prompt Caching
Anthropic supports prompt caching to reduce costs and latency for frequently repeated content. Prompt caching works by caching large blocks of content that are likely to be reused across multiple requests.
Prompt caching is configured through the cache_control option in ContentPart options. It can be applied
to both system messages, regular user messages, tool results, and tool definitions.
Anthropic limits a conversation to max of 4 cache_control blocks and will refuse to service requests with more.
Basic Usage
Setting cache_control: true is a shortcut for the default ephemeral cache control:
# System message with caching
Message.new_system!([
ContentPart.text!("You are an AI assistant analyzing literary works."),
ContentPart.text!("<large document content>", cache_control: true)
])
# User message with caching
Message.new_user!([
ContentPart.text!("Please analyze this document:"),
ContentPart.text!("<large document content>", cache_control: true)
])This will set a single cache breakpoint that will include your functions (processed first) and system message. Anthropic limits conversations to a maximum of 4 cache_control blocks.
For multi-turn conversations, turning on message_caching (see below) will add a second cache breakpoint and give you higher cache utilization and response times. Writing to the cache increases write costs so this setting is not on by default.
Supported Content Types
Prompt caching can be applied to:
- Text content in system messages
- Text content in user messages
- Tool results in the
contentfield when returning a list ofContentPartstructs. - Tool definitions in the
optionsfield when creating aFunctionstruct.
For more information, see the Anthropic prompt caching documentation.
Advanced Cache Control
For more explicit control over caching parameters, you can provide a map instead of true:
ContentPart.text!("content", cache_control: %{"type" => "ephemeral", "ttl" => "1h"})When cache_control: true is used, it automatically expands to %{"type" => "ephemeral"} in the API request.
If you need specific cache control settings like TTL, providing them explicitly preserves the exact values
sent to the API.
The default is "5m" for 5 minutes but supports "1h" for 1 hour depending on your account.
Automatic Message Caching for Multi-Turn Conversations
The :cache_messages option automates cache breakpoint placement for multi-turn conversations by
adding cache_control to the last N user messages.
When to Use This Feature
Good fit:
- Multi-turn conversations (3+ turns) where you'll reuse conversation history
- Conversations with tool use where uncached tool messages cause cache degradation
- Applications where both cost reduction AND latency reduction are valuable (~10x faster cache reads)
- Repeated similar queries with different follow-ups
Not recommended:
- Single-turn requests (no benefit, only added cost)
- Short conversations (1-2 turns) where break-even isn't reached
- Highly diverse conversations with no repeated context
Benefits
When enabled, you get:
- Reduced latency: Cache reads are ~10x faster than processing tokens from scratch
- Lower costs: After break-even (typically 2-3 turns), costs drop significantly
- Automatic optimization: No manual cache management required
Why Multiple Breakpoints?
Anthropic limits conversations to 4 cache breakpoints total. The optimal generic strategy is:
- 1 breakpoint for system prompt: Static context (instructions, large documents) that never changes
- 3 breakpoints for user messages: Recent conversation history (this is the default)
Using multiple user message breakpoints (rather than just 1) provides 15-25% cost savings in multi-turn conversations with tools:
- Single breakpoint problem: Heavy tool use adds uncached messages between user messages, causing steep degradation
- Multiple breakpoints solution: Caches recent conversation history, reducing degradation
- Tradeoff: More cache writes initially, but pays off after 2-3 turns
Key Behaviors
- Default count: 3 user message breakpoints (reserves 1 for system prompt in the 4 breakpoint limit)
- Always cache current messages: Breakpoints are placed on the most recent user messages, not previous ones
- Breakpoints move: As conversations grow, old messages fall out of cache but recent ones stay cached
- Expected utilization: 85-95% for early turns, stabilizes at 70-85% for longer conversations
Enabling Message Caching
Message caching is disabled by default since writing to the cache increases write costs (1.25x for 5m TTL, 3x for 1h).
Enable with defaults (3 breakpoints, 5m TTL):
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true}
})Configuring Breakpoint Count
Adjust the number of breakpoints based on your use case (max: 4):
# Conservative: single breakpoint (original behavior)
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true, count: 1}
})
# Balanced: 2 breakpoints for moderate conversations
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true, count: 2}
})
# Optimal for tool-heavy: 3 breakpoints (default)
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true, count: 3}
})
# Maximum: 4 breakpoints (assuming no system prompt caching)
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true, count: 4}
})With Custom TTL
Specify a custom TTL (time-to-live):
model = ChatAnthropic.new!(%{
model: "claude-3-5-sonnet-20241022",
cache_messages: %{enabled: true, count: 2, ttl: "1h"}
})Supported TTL values: "5m" (5 minutes, default) and "1h" (1 hour).
Cost note: Cache writes with 5m TTL cost 1.25x, 1h TTL costs 3x.
How It Works
- Breakpoints are placed on the last text ContentPart of each selected user message
- Tool results are skipped (not cacheable at the message level)
- Explicit
cache_controlsettings in ContentParts are preserved - Works alongside system message and tool caching
What to Expect
Cost breakdown for a conversation with 3 breakpoints and 5m TTL (relative to baseline without caching):
Pricing context:
- Baseline input: 1.0x (standard input token cost)
- Cache write: 1.25x (costs 25% more than baseline)
- Cache read: 0.1x (90% discount - costs 10% of baseline)
Turn-by-turn costs:
- Turn 1: 0% cache hit, 100% cache write
- Effective cost: ~1.25x baseline (all writes, no reads yet)
- Turn 2: 85-95% cache hit
- Effective cost: ~0.20x baseline (mostly cheap reads: 90% × 0.1x + 10% × 1.0x)
- Turn 3+: 70-85% cache hit
- Effective cost: ~0.25x baseline (mostly cheap reads: 80% × 0.1x + 20% × 1.0x)
Break-even: Typically 2-3 turns. After turn 2, you're paying ~20-25% of baseline cost.
Latency: Cache reads are ~10x faster than processing tokens, significantly reducing response time.
Cache utilization will stabilize around 70-85% for longer conversations as old content falls out of cache, but recent context stays cached. The cost savings and latency benefits continue for the life of the conversation.
Monitoring Cache Utilization
Cache utilization data is accessible in two places:
- The Claude console: https://console.anthropic.com/usage
- Response metadata in
LangChain.Messageundermetadata.usage.raw
Cache utilization is visible in the message metadata.usage.raw map. Inspecting the first and second completed messages can make the behavior more clear.
Turn 1 response (initial request):
%{
"cache_creation" => %{
"ephemeral_1h_input_tokens" => 0,
"ephemeral_5m_input_tokens" => 3657
},
"cache_creation_input_tokens" => 7314,
"cache_read_input_tokens" => 0,
"input_tokens" => 18,
"output_tokens" => 197,
"service_tier" => "standard"
}Cache read utilization: cache_read_input_tokens / (cache_read_input_tokens + input_tokens) * 100 = 0%
This 0% utilization is expected for the initial request (all cache writes, no reads yet).
Turn 2 response (cache now available):
%{
"cache_creation" => %{
"ephemeral_1h_input_tokens" => 0,
"ephemeral_5m_input_tokens" => 146
},
"cache_creation_input_tokens" => 292,
"cache_read_input_tokens" => 3604,
"input_tokens" => 18,
"output_tokens" => 644,
"service_tier" => "standard"
}Cache read utilization: 3604 / (3604 + 18) * 100 = 99.5%
This high utilization shows most of the prompt is being read from cache, with only new content being processed and written.
Important Notes
- Minimum prompt length: If your cache breakpoint doesn't meet the minimum cacheable prompt length, it won't be cached at all.
- Model-specific limits: Different models have different cache limitations - see https://docs.claude.com/en/docs/build-with-claude/prompt-caching#cache-limitations
- Haiku considerations: Haiku has a high minimum (4096 tokens) which can mean low initial utilization. However, enabling
:cache_messageshas minimal cost impact, so it's safe to enable. - TTL tradeoff: Default TTL is 5m (1.25x write cost). Setting TTL to 1h increases write cost to 3x but may improve utilization for longer sessions.
Summary
Functions
Calls the Anthropic API passing the ChatAnthropic struct with configuration, plus either a simple message or the list of messages to act as the prompt.
Converts a ContentPart to the format expected by the Anthropic API.
Converts a list of ContentParts to the format expected by the Anthropic API.
Convert a LangChain structure to the expected map of data for the Anthropic API.
Return the params formatted for an API request.
Convert a Function to the format expected by the Anthropic API.
Converts a Message to the format expected by the Anthropic API.
Setup a ChatAnthropic client configuration.
Setup a ChatAnthropic client configuration and return it or raise an error if invalid.
After all the messages have been converted using for_api/1, this combines
multiple sequential tool response messages. The Anthropic API is very strict
about user, assistant, user, assistant sequenced messages.
Restores the model from the config.
Determine if an error should be retried. If true, a fallback LLM may be
used. If false, the error is understood to be more fundamental with the
request rather than a service issue and it should not be retried or fallback
to another service.
Generate a config map that can later restore the model's configuration.
Types
@type t() :: %LangChain.ChatModels.ChatAnthropic{ api_key: term(), api_version: term(), bedrock: term(), beta_headers: term(), cache_messages: term(), callbacks: term(), endpoint: term(), max_tokens: term(), model: term(), receive_timeout: term(), req_opts: term(), stream: term(), temperature: term(), thinking: term(), tool_choice: term(), top_k: term(), top_p: term(), verbose_api: term() }
Functions
Calls the Anthropic API passing the ChatAnthropic struct with configuration, plus either a simple message or the list of messages to act as the prompt.
Optionally pass in a callback function that can be executed as data is received from the API.
NOTE: This function can be used directly, but the primary interface
should be through LangChain.Chains.LLMChain. The ChatAnthropic module is more focused on
translating the LangChain data structures to and from the Anthropic API.
Another benefit of using LangChain.Chains.LLMChain is that it combines the
storage of messages, adding functions, adding custom context that should be
passed to functions, and automatically applying LangChain.MessageDelta
structs as they are are received, then converting those to the full
LangChain.Message once fully complete.
@spec content_part_for_api(LangChain.Message.ContentPart.t()) :: map() | nil | no_return()
Converts a ContentPart to the format expected by the Anthropic API.
Handles different content types:
:text- Converts to a text content part, optionally with cache control settings:thinking- Converts to a thinking content part with required signature:unsupported- Handles custom content types specified in options:image- Converts to an image content part with base64 data and media type:image_url- Raises an error as Anthropic doesn't support image URLs
Options
For :text type:
:cache_control- When provided, adds cache control settings to the content
For :thinking type:
:signature- Required signature for thinking content
For :unsupported type:
:type- Required string specifying the custom content type
For :image type:
:media- Required media type (:png,:jpg,:jpeg,:gif,:webp, or a string)
Returns nil for unsupported content without required options.
Converts a list of ContentParts to the format expected by the Anthropic API.
@spec for_api( LangChain.Message.t() | LangChain.Message.ContentPart.t() | LangChain.Function.t() ) :: %{required(String.t()) => any()} | no_return()
Convert a LangChain structure to the expected map of data for the Anthropic API.
@spec for_api(t(), message :: [map()], LangChain.ChatModels.ChatModel.tools()) :: %{ required(atom()) => any() }
Return the params formatted for an API request.
@spec function_for_api(LangChain.Function.t()) :: map() | no_return()
Convert a Function to the format expected by the Anthropic API.
Converts a Message to the format expected by the Anthropic API.
@spec new(attrs :: map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}
Setup a ChatAnthropic client configuration.
Setup a ChatAnthropic client configuration and return it or raise an error if invalid.
post_process_and_combine_messages(messages, cache_messages \\ nil)
View SourceAfter all the messages have been converted using for_api/1, this combines
multiple sequential tool response messages. The Anthropic API is very strict
about user, assistant, user, assistant sequenced messages.
When cache_messages is set, it also adds cache_control to the last N user messages'
last ContentPart to enable efficient caching in multi-turn conversations.
Restores the model from the config.
@spec retry_on_fallback?(LangChain.LangChainError.t()) :: boolean()
Determine if an error should be retried. If true, a fallback LLM may be
used. If false, the error is understood to be more fundamental with the
request rather than a service issue and it should not be retried or fallback
to another service.
Generate a config map that can later restore the model's configuration.