# `LangChain.ChatModels.ChatAwsMantle`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L1)

Represents a chat model hosted by AWS Bedrock's **Mantle** endpoint — the
OpenAI-compatible gateway AWS introduced for third-party models such as
Moonshot AI's Kimi K2 family and OpenAI's gpt-oss series.

Mantle accepts standard OpenAI Chat Completions requests, so much of the
wire format mirrors `LangChain.ChatModels.ChatOpenAI`. This module exists as
a separate chat model because Mantle has several differences that warrant
dedicated handling:

- **Region-aware URL building** — `https://bedrock-mantle.{region}.api.aws/v1/chat/completions`
- **Two auth modes** — Bedrock API key (Bearer) **or** AWS IAM (SigV4)
- **Reasoning extraction** — Mantle returns model reasoning at `message.reasoning`
  (or `delta.reasoning` when streaming), which `ChatOpenAI` silently drops
- **Higher default `receive_timeout`** — Mantle exhibits intermittent slow
  starts of 60s+, so the default is 120s here vs OpenAI's 60s
- **Bounded default `max_tokens: 4096`** — Kimi occasionally falls into
  token-repetition loops; streaming keeps the HTTP layer alive as chunks
  arrive, so an uncapped request can run indefinitely. Override as needed
  when reasoning budgets require more
- **Per-model quirks** — Kimi prepends a leading space to text content; uses
  `functions.NAME:N` for `call_id` shape; narrates before tool calls

## Tested Models (as of writing)

| Model ID                         | Vendor    | Notes                                  |
| -------------------------------- | --------- | -------------------------------------- |
| `moonshotai.kimi-k2-thinking`    | Moonshot  | Reasoning by default, 128K ctx         |
| `moonshotai.kimi-k2.5`           | Moonshot  | Multimodal, hybrid thinking via `:reasoning_effort` |
| `openai.gpt-oss-120b`            | OpenAI    | Open-source GPT, hosted by AWS         |

Refer to the [published list of supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models.html).

## Authentication

Two mutually-exclusive auth modes:

### Bearer (Bedrock API key) — simplest

Generate a long-term Bedrock API key in the AWS console
([Bedrock API keys](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create))
and set it as the `:api_key`:

    ChatAwsMantle.new!(%{
      model: "moonshotai.kimi-k2.5",
      region: "us-east-1",
      api_key: System.fetch_env!("AWS_BEARER_TOKEN_BEDROCK")
    })

### AWS SigV4 (IAM credentials) — production-friendly

Pass a zero-arity function returning IAM credentials. Useful when the host
already has IAM-based credentials available (e.g. ExAws):

    ChatAwsMantle.new!(%{
      model: "moonshotai.kimi-k2.5",
      region: "us-east-1",
      credentials: fn ->
        ExAws.Config.new(:s3)
        |> Map.take([:access_key_id, :secret_access_key])
        |> Map.to_list()
      end
    })

## Reasoning / Thinking

K2.5 is a hybrid thinking model — pass OpenAI's standard `:reasoning_effort`
to enable structured reasoning:

    ChatAwsMantle.new!(%{
      model: "moonshotai.kimi-k2.5",
      region: "us-east-1",
      api_key: System.fetch_env!("AWS_BEARER_TOKEN_BEDROCK"),
      reasoning_effort: "high"
    })

When reasoning is active, the response message will include a `ContentPart`
of `type: :thinking` containing the model's chain of thought, alongside the
normal `:text` content parts.

K2 Thinking always reasons (it's the model's default mode); the field is
populated regardless of `:reasoning_effort`.

## Sampling controls

Standard OpenAI sampling parameters are supported and passed through to
Mantle unchanged:

- `:temperature` — 0.0 to 2.0 (default `1.0`)
- `:top_p` — 0.0 to 1.0 nucleus sampling cutoff. OpenAI recommends tuning
  this *or* temperature, not both
- `:frequency_penalty` — -2.0 to 2.0. Positive values discourage reuse of
  tokens proportional to how often they've already appeared. **Kimi K2.5
  on Mantle has been observed to occasionally lock into single-token
  repetition loops (e.g. streams of "!"); `frequency_penalty: 0.5` is a
  reasonable starting defense.**
- `:presence_penalty` — -2.0 to 2.0. Binary variant of frequency_penalty
  (penalizes any token that has appeared at all)

## Streaming

Set `stream: true` to receive incremental `MessageDelta` updates via the
`on_llm_new_delta` callback. Mantle emits standard OpenAI SSE chunks for
content and tool calls, and adds a sibling `delta.reasoning` field when
reasoning is active. `ChatAwsMantle` extracts those into `:thinking`
ContentParts so the merged final message carries `[thinking_part, text_part]`
in order.

## Multimodal (K2.5 vision)

Kimi K2.5 is natively multimodal. Send images via standard LangChain
`ContentPart` structs — `ChatAwsMantle` delegates serialization to
`ChatOpenAI.content_part_for_api/2`, which emits Mantle's expected
`{"type": "image_url", "image_url": {"url": "data:<media>;base64,..."}}`
shape:

    {:ok, bytes} = File.read("photo.jpg")

    Message.new_user!([
      ContentPart.text!("What's in this image?"),
      ContentPart.image!(Base.encode64(bytes), media: :jpeg)
    ])
    |> then(&ChatAwsMantle.call(model, [&1]))

Mantle runs images through an upstream sanitizer that rejects degenerate
inputs (tiny or unusual images may return a 400 with
`"Failed to sanitize image"`). Use real photographs or reasonably-sized
source images. Vision tokens add meaningfully to `prompt_tokens` — a
1200×675 JPG consumes roughly 1100 prompt tokens.

## Open Notes

Streaming and tool-calling support follow the same wire format as
`ChatOpenAI` — see the smoke tests for verified behavior.

# `t`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L224)

```elixir
@type t() :: %LangChain.ChatModels.ChatAwsMantle{
  api_key: term(),
  callbacks: term(),
  credentials: term(),
  endpoint: term(),
  frequency_penalty: term(),
  json_response: term(),
  json_schema: term(),
  max_tokens: term(),
  model: term(),
  presence_penalty: term(),
  reasoning_effort: term(),
  receive_timeout: term(),
  region: term(),
  req_config: term(),
  stream: term(),
  stream_options: term(),
  temperature: term(),
  tool_choice: term(),
  top_p: term(),
  verbose_api: term()
}
```

# `call`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L454)

Make a call to the Mantle API. Returns `{:ok, [%Message{}]}` on success or
`{:error, %LangChainError{}}` on failure.

# `for_api`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L370)

```elixir
@spec for_api(t(), [LangChain.Message.t()], [LangChain.Function.t()]) :: %{
  required(atom()) =&gt; any()
}
```

Format the request body for the Mantle API. Reuses `ChatOpenAI`'s per-message
formatting (since the wire format is OpenAI-shaped), but assembles the
top-level body with Mantle-relevant fields only.

# `new`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L255)

```elixir
@spec new(attrs :: map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}
```

Build a new `ChatAwsMantle` instance from attributes.

# `new!`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.4/lib/chat_models/chat_aws_mantle.ex#L266)

```elixir
@spec new!(attrs :: map()) :: t() | no_return()
```

Build a new `ChatAwsMantle` instance, raising on validation failure.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
