LangChain.ChatModels.ChatAwsMantle (LangChain v0.8.4)

Represents a chat model hosted by AWS Bedrock's Mantle endpoint — the OpenAI-compatible gateway AWS introduced for third-party models such as Moonshot AI's Kimi K2 family and OpenAI's gpt-oss series.

Mantle accepts standard OpenAI Chat Completions requests, so much of the wire format mirrors LangChain.ChatModels.ChatOpenAI. This module exists as a separate chat model because Mantle has several differences that warrant dedicated handling:

Region-aware URL building — https://bedrock-mantle.{region}.api.aws/v1/chat/completions
Two auth modes — Bedrock API key (Bearer) or AWS IAM (SigV4)
Reasoning extraction — Mantle returns model reasoning at message.reasoning (or delta.reasoning when streaming), which ChatOpenAI silently drops
Higher default receive_timeout — Mantle exhibits intermittent slow starts of 60s+, so the default is 120s here vs OpenAI's 60s
Bounded default max_tokens: 4096 — Kimi occasionally falls into token-repetition loops; streaming keeps the HTTP layer alive as chunks arrive, so an uncapped request can run indefinitely. Override as needed when reasoning budgets require more
Per-model quirks — Kimi prepends a leading space to text content; uses functions.NAME:N for call_id shape; narrates before tool calls

Tested Models (as of writing)

Model ID	Vendor	Notes
`moonshotai.kimi-k2-thinking`	Moonshot	Reasoning by default, 128K ctx
`moonshotai.kimi-k2.5`	Moonshot	Multimodal, hybrid thinking via `:reasoning_effort`
`openai.gpt-oss-120b`	OpenAI	Open-source GPT, hosted by AWS

Refer to the published list of supported models.

Authentication

Two mutually-exclusive auth modes:

Bearer (Bedrock API key) — simplest

Generate a long-term Bedrock API key in the AWS console (Bedrock API keys) and set it as the :api_key:

ChatAwsMantle.new!(%{
  model: "moonshotai.kimi-k2.5",
  region: "us-east-1",
  api_key: System.fetch_env!("AWS_BEARER_TOKEN_BEDROCK")
})

AWS SigV4 (IAM credentials) — production-friendly

Pass a zero-arity function returning IAM credentials. Useful when the host already has IAM-based credentials available (e.g. ExAws):

ChatAwsMantle.new!(%{
  model: "moonshotai.kimi-k2.5",
  region: "us-east-1",
  credentials: fn ->
    ExAws.Config.new(:s3)
    |> Map.take([:access_key_id, :secret_access_key])
    |> Map.to_list()
  end
})

Reasoning / Thinking

K2.5 is a hybrid thinking model — pass OpenAI's standard :reasoning_effort to enable structured reasoning:

ChatAwsMantle.new!(%{
  model: "moonshotai.kimi-k2.5",
  region: "us-east-1",
  api_key: System.fetch_env!("AWS_BEARER_TOKEN_BEDROCK"),
  reasoning_effort: "high"
})

When reasoning is active, the response message will include a ContentPart of type: :thinking containing the model's chain of thought, alongside the normal :text content parts.

K2 Thinking always reasons (it's the model's default mode); the field is populated regardless of :reasoning_effort.

Sampling controls

Standard OpenAI sampling parameters are supported and passed through to Mantle unchanged:

:temperature — 0.0 to 2.0 (default 1.0)
:top_p — 0.0 to 1.0 nucleus sampling cutoff. OpenAI recommends tuning this or temperature, not both
:frequency_penalty — -2.0 to 2.0. Positive values discourage reuse of tokens proportional to how often they've already appeared. Kimi K2.5 on Mantle has been observed to occasionally lock into single-token repetition loops (e.g. streams of "!"); frequency_penalty: 0.5 is a reasonable starting defense.
:presence_penalty — -2.0 to 2.0. Binary variant of frequency_penalty (penalizes any token that has appeared at all)

Streaming

Set stream: true to receive incremental MessageDelta updates via the on_llm_new_delta callback. Mantle emits standard OpenAI SSE chunks for content and tool calls, and adds a sibling delta.reasoning field when reasoning is active. ChatAwsMantle extracts those into :thinking ContentParts so the merged final message carries [thinking_part, text_part] in order.

Multimodal (K2.5 vision)

Kimi K2.5 is natively multimodal. Send images via standard LangChain ContentPart structs — ChatAwsMantle delegates serialization to ChatOpenAI.content_part_for_api/2, which emits Mantle's expected {"type": "image_url", "image_url": {"url": "data:<media>;base64,..."}} shape:

{:ok, bytes} = File.read("photo.jpg")

Message.new_user!([
  ContentPart.text!("What's in this image?"),
  ContentPart.image!(Base.encode64(bytes), media: :jpeg)
])
|> then(&ChatAwsMantle.call(model, [&1]))

Mantle runs images through an upstream sanitizer that rejects degenerate inputs (tiny or unusual images may return a 400 with "Failed to sanitize image"). Use real photographs or reasonably-sized source images. Vision tokens add meaningfully to prompt_tokens — a 1200×675 JPG consumes roughly 1100 prompt tokens.

Open Notes

Streaming and tool-calling support follow the same wire format as ChatOpenAI — see the smoke tests for verified behavior.

Summary

Types

t()

Functions

call(model, prompt, tools \\ [])

Make a call to the Mantle API. Returns {:ok, [%Message{}]} on success or {:error, %LangChainError{}} on failure.

for_api(model, messages, tools)

Format the request body for the Mantle API. Reuses ChatOpenAI's per-message formatting (since the wire format is OpenAI-shaped), but assembles the top-level body with Mantle-relevant fields only.

new(attrs \\ %{})

Build a new ChatAwsMantle instance from attributes.

new!(attrs \\ %{})

Build a new ChatAwsMantle instance, raising on validation failure.