# `Alloy`
[🔗](https://github.com/alloy-ex/alloy/blob/v0.10.1/lib/alloy.ex#L1)

Model-agnostic agent harness for Elixir.

Alloy provides the minimal agent loop: send messages to any LLM,
execute tool calls, loop until done. Zero framework dependencies.

## Quick Start

    {:ok, result} = Alloy.run("What is 2+2?",
      provider: {Alloy.Provider.Anthropic, api_key: "sk-ant-..."},
      system_prompt: "You are helpful."
    )
    result.text #=> "4"

## With Tools

    {:ok, result} = Alloy.run("Read mix.exs and tell me the version",
      provider: {Alloy.Provider.Anthropic, api_key: "sk-ant-..."},
      tools: [Alloy.Tool.Core.Read],
      max_turns: 10
    )

## Continuing a Conversation

    {:ok, result} = Alloy.run("Now edit that file",
      provider: {Alloy.Provider.OpenAI, api_key: "sk-..."},
      tools: [Alloy.Tool.Core.Read, Alloy.Tool.Core.Edit],
      messages: previous_result.messages
    )

## One-Shot Streaming

    {:ok, result} = Alloy.stream("Explain OTP", fn chunk ->
      IO.write(chunk)
    end,
      provider: {Alloy.Provider.OpenAI, api_key: "sk-...", model: "gpt-5.4"}
    )

## Options

- `:provider` - `{module, config_keyword_list}` or just `module` (required)
- `:tools` - list of modules implementing `Alloy.Tool` (default: `[]`)
- `:system_prompt` - system prompt string (default: `nil`)
- `:messages` - existing conversation history (default: `[]`)
- `:max_turns` - maximum agent loop iterations (default: `25`)
- `:max_tokens` - context window budget for compaction (default: provider model window when known, otherwise `200_000`)
- `:compaction` - grouped compaction settings like `reserve_tokens`, `keep_recent_tokens`, and `fallback` (default: derived from `:max_tokens`)
- `:middleware` - list of `Alloy.Middleware` modules (default: `[]`)
- `:working_directory` - base path for file tools (default: `"."`)
- `:context` - arbitrary map passed to tools and middleware (default: `%{}`)
- `:max_pending` - max queued async `send_message/3` requests while one is running (default: `0`)
- `:model_metadata_overrides` - overrides for model context windows used to derive `:max_tokens` when not set explicitly (default: `%{}`)
- `:until_tool` - tool name (string) that must be called before the loop completes. If the model signals `:end_turn` without calling this tool, the loop continues with a prompt to call it. Useful for structured output enforcement. (default: `nil`)

# `result`

```elixir
@type result() :: Alloy.Result.t()
```

# `cancel_request`

```elixir
@spec cancel_request(GenServer.server(), binary()) :: :ok | {:error, :not_found}
```

Cancel an async request by `request_id`.

# `run`

```elixir
@spec run(
  String.t() | nil,
  keyword()
) :: {:ok, result()} | {:error, result()}
```

Run the agent loop with a message and options.

The first argument can be a string (converted to a user message)
or ignored if `:messages` option provides conversation history.

Returns `{:ok, result}` on completion or `{:error, result}` on failure.

# `send_message`

```elixir
@spec send_message(GenServer.server(), String.t(), keyword()) ::
  {:ok, binary()} | {:error, :busy | :queue_full | :no_pubsub}
```

Send a message to a running agent without blocking the caller.

Non-blocking fire-and-forget. Returns `{:ok, request_id}` immediately.
Results are broadcast via PubSub. See `Alloy.Agent.Server.send_message/3`
for full documentation.

# `stream`

```elixir
@spec stream(String.t() | nil, (String.t() -&gt; any()), keyword()) ::
  {:ok, result()} | {:error, result()}
```

Run the agent loop and stream text deltas as they arrive.

This is a one-shot convenience API for callers who do not need a persistent
`Alloy.Agent.Server` process. It returns the same result shape as `run/2`.

The first argument can be a string (converted to a user message)
or `nil` if the `:messages` option provides conversation history.

## Options

Accepts the same options as `run/2`, plus:

- `:on_event` - function called with normalized event envelopes during the run

---

*Consult [api-reference.md](api-reference.md) for complete listing*