View Source LangChain.ChatModels.ChatAnthropic (LangChain v0.4.1)

Module for interacting with Anthropic models.

Parses and validates inputs for making requests to Anthropic's messages API.

Converts responses into more specialized LangChain data structures.

Callbacks

See the set of available callbacks: LangChain.Chains.ChainCallbacks

Rate Limit API Response Headers

Anthropic returns rate limit information in the response headers. Those can be accessed using an LLM callback like this:

handler = %{
  on_llm_ratelimit_info: fn _chain, headers ->
    IO.inspect(headers)
  end
}

%{llm: ChatAnthropic.new!(%{model: "..."})}
|> LLMChain.new!()
# ... add messages ...
|> LLMChain.add_callback(handler)
|> LLMChain.run()

When a request is received, something similar to the following will be output to the console.

%{
  "anthropic-ratelimit-requests-limit" => ["50"],
  "anthropic-ratelimit-requests-remaining" => ["49"],
  "anthropic-ratelimit-requests-reset" => ["2024-06-08T04:28:30Z"],
  "anthropic-ratelimit-tokens-limit" => ["50000"],
  "anthropic-ratelimit-tokens-remaining" => ["50000"],
  "anthropic-ratelimit-tokens-reset" => ["2024-06-08T04:28:30Z"],
  "request-id" => ["req_1234"]
}

Token Usage

Anthropic returns token usage information as part of the response body. The LangChain.TokenUsage is added to the metadata of the LangChain.Message and LangChain.MessageDelta structs that are processed under the :usage key.

%LangChain.MessageDelta{
  content: [],
  status: :incomplete,
  index: nil,
  role: :assistant,
  tool_calls: nil,
  metadata: %{
          usage: %LangChain.TokenUsage{
            input: 55,
            output: 4,
            raw: %{
              "cache_creation_input_tokens" => 0,
              "cache_read_input_tokens" => 0,
              "input_tokens" => 55,
              "output_tokens" => 4
            }
          }
  }
}

The TokenUsage data is accumulated for MessageDelta structs and the final usage information will be on the LangChain.Message.

Tool Choice

Anthropic supports forcing a tool to be used.

This is supported through the tool_choice options. It takes a plain Elixir map to provide the configuration.

By default, the LLM will choose a tool call if a tool is available and it determines it is needed. That's the "auto" mode.

Example

Force the LLM's response to make a tool call of the "get_weather" function.

ChatAnthropic.new(%{
  model: "...",
  tool_choice: %{"type" => "tool", "name" => "get_weather"}
})

AWS Bedrock Support

Anthropic Claude is supported in AWS Bedrock.

To configure ChatAnthropic for use on AWS Bedrock:

  1. Request Model Access to get access to the Anthropic models you intend to use.

  2. Using your AWS Console, create an Access Key for your application.

  3. Set the key values in your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY ENVs.

  4. Get the Model ID for the model you intend to use. Base Models

  5. Refer to LangChain.Utils.BedrockConfig for setting up the Bedrock authentication credentials for your environment.

  6. Setup your ChatAnthropic similar to the following:

    alias LangChain.ChatModels.ChatAnthropic

    ChatAnthropic.new!(%{ model: "anthropic.claude-3-5-sonnet-20241022-v2:0", bedrock: BedrockConfig.from_application_env!() })

Thinking

Models like Claude 3.7 Sonnet introduced a hybrid approach which allows for "thinking" and reasoning. See the Anthropic thinking documentation for up-to-date instructions on the usage.

For instance, enabling thinking may require the temperature to be set to 1 and other settings like topP may not be allowed.

The model supports a :thinking attribute where the data is a map that matches the structure in the Anthropic documentation. It is passed along as-is.

Example:

# Enable thinking and budget 2,000 tokens for the thinking space.
model = ChatAnthropic.new!(%{
  model: "claude-3-7-sonnet-latest",
  thinking: %{type: "enabled", budget_tokens: 2000}
})

# Disable thinking
model = ChatAnthropic.new!(%{
  model: "claude-3-7-sonnet-latest",
  thinking: %{type: "disabled"}
})

As of the documentation for Claude 3.7 Sonnet, the minimum budget for thinking is 1024 tokens.

Prompt Caching

Anthropic supports prompt caching to reduce costs and latency for frequently repeated content. Prompt caching works by caching large blocks of content that are likely to be reused across multiple requests.

Prompt caching is configured through the cache_control option in ContentPart options. It can be applied to both system messages, regular user messages, tool results, and tool definitions.

Anthropic limits a conversation to max of 4 cache_control blocks and will refuse to service requests with more.

Basic Usage

Setting cache_control: true is a shortcut for the default ephemeral cache control:

# System message with caching
Message.new_system!([
  ContentPart.text!("You are an AI assistant analyzing literary works."),
  ContentPart.text!("<large document content>", cache_control: true)
])

# User message with caching
Message.new_user!([
  ContentPart.text!("Please analyze this document:"),
  ContentPart.text!("<large document content>", cache_control: true)
])

This will set a single cache breakpoint that will include your functions (processed first) and system message. Anthropic limits conversations to a maximum of 4 cache_control blocks.

For multi-turn conversations, turning on message_caching (see below) will add a second cache breakpoint and give you higher cache utilization and response times. Writing to the cache increases write costs so this setting is not on by default.

Supported Content Types

Prompt caching can be applied to:

  • Text content in system messages
  • Text content in user messages
  • Tool results in the content field when returning a list of ContentPart structs.
  • Tool definitions in the options field when creating a Function struct.

For more information, see the Anthropic prompt caching documentation.

Advanced Cache Control

For more explicit control over caching parameters, you can provide a map instead of true:

ContentPart.text!("content", cache_control: %{"type" => "ephemeral", "ttl" => "1h"})

When cache_control: true is used, it automatically expands to %{"type" => "ephemeral"} in the API request. If you need specific cache control settings like TTL, providing them explicitly preserves the exact values sent to the API.

The default is "5m" for 5 minutes but supports "1h" for 1 hour depending on your account.

Automatic Message Caching for Multi-Turn Conversations

The :cache_messages option automates cache breakpoint placement for multi-turn conversations by adding cache_control to the last N user messages.

When to Use This Feature

Good fit:

  • Multi-turn conversations (3+ turns) where you'll reuse conversation history
  • Conversations with tool use where uncached tool messages cause cache degradation
  • Applications where both cost reduction AND latency reduction are valuable (~10x faster cache reads)
  • Repeated similar queries with different follow-ups

Not recommended:

  • Single-turn requests (no benefit, only added cost)
  • Short conversations (1-2 turns) where break-even isn't reached
  • Highly diverse conversations with no repeated context

Benefits

When enabled, you get:

  • Reduced latency: Cache reads are ~10x faster than processing tokens from scratch
  • Lower costs: After break-even (typically 2-3 turns), costs drop significantly
  • Automatic optimization: No manual cache management required

Why Multiple Breakpoints?

Anthropic limits conversations to 4 cache breakpoints total. The optimal generic strategy is:

  • 1 breakpoint for system prompt: Static context (instructions, large documents) that never changes
  • 3 breakpoints for user messages: Recent conversation history (this is the default)

Using multiple user message breakpoints (rather than just 1) provides 15-25% cost savings in multi-turn conversations with tools:

  • Single breakpoint problem: Heavy tool use adds uncached messages between user messages, causing steep degradation
  • Multiple breakpoints solution: Caches recent conversation history, reducing degradation
  • Tradeoff: More cache writes initially, but pays off after 2-3 turns

Key Behaviors

  • Default count: 3 user message breakpoints (reserves 1 for system prompt in the 4 breakpoint limit)
  • Always cache current messages: Breakpoints are placed on the most recent user messages, not previous ones
  • Breakpoints move: As conversations grow, old messages fall out of cache but recent ones stay cached
  • Expected utilization: 85-95% for early turns, stabilizes at 70-85% for longer conversations

Enabling Message Caching

Message caching is disabled by default since writing to the cache increases write costs (1.25x for 5m TTL, 3x for 1h).

Enable with defaults (3 breakpoints, 5m TTL):

model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true}
})

Configuring Breakpoint Count

Adjust the number of breakpoints based on your use case (max: 4):

# Conservative: single breakpoint (original behavior)
model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true, count: 1}
})

# Balanced: 2 breakpoints for moderate conversations
model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true, count: 2}
})

# Optimal for tool-heavy: 3 breakpoints (default)
model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true, count: 3}
})

# Maximum: 4 breakpoints (assuming no system prompt caching)
model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true, count: 4}
})

With Custom TTL

Specify a custom TTL (time-to-live):

model = ChatAnthropic.new!(%{
  model: "claude-3-5-sonnet-20241022",
  cache_messages: %{enabled: true, count: 2, ttl: "1h"}
})

Supported TTL values: "5m" (5 minutes, default) and "1h" (1 hour).

Cost note: Cache writes with 5m TTL cost 1.25x, 1h TTL costs 3x.

How It Works

  • Breakpoints are placed on the last text ContentPart of each selected user message
  • Tool results are skipped (not cacheable at the message level)
  • Explicit cache_control settings in ContentParts are preserved
  • Works alongside system message and tool caching

What to Expect

Cost breakdown for a conversation with 3 breakpoints and 5m TTL (relative to baseline without caching):

Pricing context:

  • Baseline input: 1.0x (standard input token cost)
  • Cache write: 1.25x (costs 25% more than baseline)
  • Cache read: 0.1x (90% discount - costs 10% of baseline)

Turn-by-turn costs:

  • Turn 1: 0% cache hit, 100% cache write
    • Effective cost: ~1.25x baseline (all writes, no reads yet)
  • Turn 2: 85-95% cache hit
    • Effective cost: ~0.20x baseline (mostly cheap reads: 90% × 0.1x + 10% × 1.0x)
  • Turn 3+: 70-85% cache hit
    • Effective cost: ~0.25x baseline (mostly cheap reads: 80% × 0.1x + 20% × 1.0x)

Break-even: Typically 2-3 turns. After turn 2, you're paying ~20-25% of baseline cost.

Latency: Cache reads are ~10x faster than processing tokens, significantly reducing response time.

Cache utilization will stabilize around 70-85% for longer conversations as old content falls out of cache, but recent context stays cached. The cost savings and latency benefits continue for the life of the conversation.

Monitoring Cache Utilization

Cache utilization data is accessible in two places:

Cache utilization is visible in the message metadata.usage.raw map. Inspecting the first and second completed messages can make the behavior more clear.

Turn 1 response (initial request):

%{
  "cache_creation" => %{
    "ephemeral_1h_input_tokens" => 0,
    "ephemeral_5m_input_tokens" => 3657
  },
  "cache_creation_input_tokens" => 7314,
  "cache_read_input_tokens" => 0,
  "input_tokens" => 18,
  "output_tokens" => 197,
  "service_tier" => "standard"
}

Cache read utilization: cache_read_input_tokens / (cache_read_input_tokens + input_tokens) * 100 = 0%

This 0% utilization is expected for the initial request (all cache writes, no reads yet).

Turn 2 response (cache now available):

%{
  "cache_creation" => %{
    "ephemeral_1h_input_tokens" => 0,
    "ephemeral_5m_input_tokens" => 146
  },
  "cache_creation_input_tokens" => 292,
  "cache_read_input_tokens" => 3604,
  "input_tokens" => 18,
  "output_tokens" => 644,
  "service_tier" => "standard"
}

Cache read utilization: 3604 / (3604 + 18) * 100 = 99.5%

This high utilization shows most of the prompt is being read from cache, with only new content being processed and written.

Important Notes

  • Minimum prompt length: If your cache breakpoint doesn't meet the minimum cacheable prompt length, it won't be cached at all.
  • Model-specific limits: Different models have different cache limitations - see https://docs.claude.com/en/docs/build-with-claude/prompt-caching#cache-limitations
  • Haiku considerations: Haiku has a high minimum (4096 tokens) which can mean low initial utilization. However, enabling :cache_messages has minimal cost impact, so it's safe to enable.
  • TTL tradeoff: Default TTL is 5m (1.25x write cost). Setting TTL to 1h increases write cost to 3x but may improve utilization for longer sessions.

Summary

Functions

Calls the Anthropic API passing the ChatAnthropic struct with configuration, plus either a simple message or the list of messages to act as the prompt.

Converts a ContentPart to the format expected by the Anthropic API.

Converts a list of ContentParts to the format expected by the Anthropic API.

Convert a LangChain structure to the expected map of data for the Anthropic API.

Return the params formatted for an API request.

Convert a Function to the format expected by the Anthropic API.

Converts a Message to the format expected by the Anthropic API.

Setup a ChatAnthropic client configuration.

Setup a ChatAnthropic client configuration and return it or raise an error if invalid.

After all the messages have been converted using for_api/1, this combines multiple sequential tool response messages. The Anthropic API is very strict about user, assistant, user, assistant sequenced messages.

Restores the model from the config.

Determine if an error should be retried. If true, a fallback LLM may be used. If false, the error is understood to be more fundamental with the request rather than a service issue and it should not be retried or fallback to another service.

Generate a config map that can later restore the model's configuration.

Types

@type t() :: %LangChain.ChatModels.ChatAnthropic{
  api_key: term(),
  api_version: term(),
  bedrock: term(),
  beta_headers: term(),
  cache_messages: term(),
  callbacks: term(),
  endpoint: term(),
  max_tokens: term(),
  model: term(),
  receive_timeout: term(),
  req_opts: term(),
  stream: term(),
  temperature: term(),
  thinking: term(),
  tool_choice: term(),
  top_k: term(),
  top_p: term(),
  verbose_api: term()
}

Functions

Link to this function

call(anthropic, prompt, functions \\ [])

View Source

Calls the Anthropic API passing the ChatAnthropic struct with configuration, plus either a simple message or the list of messages to act as the prompt.

Optionally pass in a callback function that can be executed as data is received from the API.

NOTE: This function can be used directly, but the primary interface should be through LangChain.Chains.LLMChain. The ChatAnthropic module is more focused on translating the LangChain data structures to and from the Anthropic API.

Another benefit of using LangChain.Chains.LLMChain is that it combines the storage of messages, adding functions, adding custom context that should be passed to functions, and automatically applying LangChain.MessageDelta structs as they are are received, then converting those to the full LangChain.Message once fully complete.

Link to this function

content_part_for_api(part)

View Source
@spec content_part_for_api(LangChain.Message.ContentPart.t()) ::
  map() | nil | no_return()

Converts a ContentPart to the format expected by the Anthropic API.

Handles different content types:

  • :text - Converts to a text content part, optionally with cache control settings
  • :thinking - Converts to a thinking content part with required signature
  • :unsupported - Handles custom content types specified in options
  • :image - Converts to an image content part with base64 data and media type
  • :image_url - Raises an error as Anthropic doesn't support image URLs

Options

For :text type:

  • :cache_control - When provided, adds cache control settings to the content

For :thinking type:

  • :signature - Required signature for thinking content

For :unsupported type:

  • :type - Required string specifying the custom content type

For :image type:

  • :media - Required media type (:png, :jpg, :jpeg, :gif, :webp, or a string)

Returns nil for unsupported content without required options.

Link to this function

content_parts_for_api(contents)

View Source

Converts a list of ContentParts to the format expected by the Anthropic API.

@spec for_api(
  LangChain.Message.t()
  | LangChain.Message.ContentPart.t()
  | LangChain.Function.t()
) ::
  %{required(String.t()) => any()} | no_return()

Convert a LangChain structure to the expected map of data for the Anthropic API.

Link to this function

for_api(anthropic, messages, tools)

View Source
@spec for_api(t(), message :: [map()], LangChain.ChatModels.ChatModel.tools()) :: %{
  required(atom()) => any()
}

Return the params formatted for an API request.

@spec function_for_api(LangChain.Function.t()) :: map() | no_return()

Convert a Function to the format expected by the Anthropic API.

Link to this function

get_system_text(message)

View Source

Converts a Message to the format expected by the Anthropic API.

@spec new(attrs :: map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}

Setup a ChatAnthropic client configuration.

@spec new!(attrs :: map()) :: t() | no_return()

Setup a ChatAnthropic client configuration and return it or raise an error if invalid.

Link to this function

post_process_and_combine_messages(messages, cache_messages \\ nil)

View Source

After all the messages have been converted using for_api/1, this combines multiple sequential tool response messages. The Anthropic API is very strict about user, assistant, user, assistant sequenced messages.

When cache_messages is set, it also adds cache_control to the last N user messages' last ContentPart to enable efficient caching in multi-turn conversations.

Restores the model from the config.

Link to this function

retry_on_fallback?(arg1)

View Source
@spec retry_on_fallback?(LangChain.LangChainError.t()) :: boolean()

Determine if an error should be retried. If true, a fallback LLM may be used. If false, the error is understood to be more fundamental with the request rather than a service issue and it should not be retried or fallback to another service.

@spec serialize_config(t()) :: %{required(String.t()) => any()}

Generate a config map that can later restore the model's configuration.