When the conversation's estimated tokens crosses compact_at (default
60% of the provider's max_tokens), the loop runs the compaction
pipeline before the next iteration.
The default pipeline mirrors Claude Code's five-layer design from the paper: cheap deterministic stages run first, the expensive LLM-summary stage runs only when those couldn't get the conversation under target.
state.messages
↓
[BudgetReduction] → drop oversized tool-result bodies
↓
[Snip] → drop stale tool-result bodies (already integrated)
↓
[Microcompact] → collapse runs of 3+ adjacent same-tool results
↓
[ContextCollapse] → non-destructive view-time projection
↓
[Summary] → LLM summary of the middle of history
↓
state.messages (or state.meta[:compact_view])Each stage is a module implementing ExAthena.Compactor.Stage with
compact_stage/2 and name/0. The pipeline orchestrator
(ExAthena.Compactor.Pipeline) walks the list with Enum.reduce_while
and short-circuits once estimated tokens fall below target. Every
stage runs inside its own
[:ex_athena, :compaction, <:stage_name>, :start | :stop] telemetry
span.
Built-in stages
Compactors.BudgetReduction
Cheap, deterministic. For each tool-result message whose content
exceeds :per_tool_result_max_chars (default 16k), replace the body
with [truncated; full=N chars; ref=<id>] and persist the original
to state.meta[:tool_result_archive] keyed by ref.
This single stage often gets the conversation under target on its own
when one outlier tool produced a giant response (a Read of a 100KB
file, a Bash grep over a huge tree).
Compactors.Snip
Cheap, deterministic. Drops stale tool-result bodies older than
:snip_age_iterations turns (default 4) whose paired assistant turn
already happened. Each is replaced with a <snipped: stale tool-result for call <id>> marker — pairing-by-id stays intact.
Memory + skill messages and the live suffix are never touched.
Compactors.Microcompact
Medium cost, deterministic. Runs of 3+ adjacent tool-result messages
collapse into a single elided summary tagged name: "microcompact".
The summary lists each call's id + first 200 chars of its result.
Compactors.ContextCollapse
Medium cost, non-destructive. Builds a projected message list for
the next request only — the authoritative state.messages is never
mutated. The projection collapses two patterns:
- Superseded reads: a
Readwhose file was later edited collapses to a<read superseded by later edit: <path>>stub. - Repeated identical tool calls: same tool name + same args
consecutive (skip-tool-results-aware) get a
(repeat)marker on the follow-up call.
Stored at state.meta[:compact_view]. Resume / replay / rewind read
the original state.messages, so they stay correct.
Compactors.Summary
Expensive — runs an LLM call. The middle of the message list is
summarised into a single assistant message tagged name: "compactor_summary". Cost counts against the run's budget. Only fires
when the deterministic stages couldn't get the conversation under
target.
Reactive recovery
When a mode returns {:error, :error_prompt_too_long} (e.g. the
provider explicitly said "context too long"), the loop runs the
pipeline with force: true — every stage attempts compaction
unconditionally, regardless of estimated tokens. The same iteration is
then retried once. If still over (or compaction itself errors), the
run terminates with a typed :error_prompt_too_long capacity
termination (PR0 finish-reason).
Gated by :reactive_compact (default true):
ExAthena.run("explore the repo", reactive_compact: false) # opt outConfiguration
Per-call
ExAthena.run("…",
compact_at: 0.5, # trigger at 50% instead of 60%
per_tool_result_max_chars: 8_000,
snip_age_iterations: 8,
microcompact_run_threshold: 5,
compaction_pipeline: [
ExAthena.Compactors.BudgetReduction,
ExAthena.Compactors.Summary # skip the others
])Global
config :ex_athena, :compactor,
compact_at: 0.6,
pinned_prefix_count: 1,
live_suffix_count: 6,
per_tool_result_max_chars: 16_000,
snip_age_iterations: 4Custom stages
Implement ExAthena.Compactor.Stage:
defmodule MyApp.Compactors.DropImages do
@behaviour ExAthena.Compactor.Stage
@impl true
def name, do: :drop_images
@impl true
def compact_stage(%ExAthena.Loop.State{} = state, estimate) do
new_messages =
Enum.map(state.messages, fn
%{role: :user, content: content} = msg when is_binary(content) ->
if String.contains?(content, "<image>"),
do: %{msg | content: "<image elided>"},
else: msg
msg ->
msg
end)
if new_messages == state.messages do
:skip
else
{:ok,
%{state | messages: new_messages},
%{estimate | tokens: ExAthena.Compactor.estimate_tokens(new_messages)}}
end
end
end
ExAthena.run("...",
compaction_pipeline: [
MyApp.Compactors.DropImages,
ExAthena.Compactors.BudgetReduction,
ExAthena.Compactors.Summary
])Stages should be idempotent — the reactive-recovery path may run
the pipeline a second time with force: true. Returning :skip on a
second pass is the canonical way to be idempotent.
Hooks
:PreCompact— fires before the pipeline runs at all.:PreCompactStage— fires before each individual stage. Payload:%{stage: atom(), estimate: %{tokens, max_tokens}}.:PostCompact— fires after a successful compaction. Payload:%{metadata: %{before, after, dropped_count, stages_applied, reason}}.
Pinning rules
The compactor never drops:
- Messages in the pinned prefix —
:pinned_prefix_countslots (default 1) at the front, plus any memory + pre-loaded skill messages (PR1). - Messages in the live suffix —
:live_suffix_countslots (default 6) at the tail.
Memory + skill messages tagged name: "memory" / name: "skill:<name>"
are pinned by default. Hosts can pin custom messages by extending
pinned_prefix_count.
See also
ExAthena.Compactor.PipelineExAthena.Compactor.Stage- Hooks reference —
PreCompact*/PostCompactpayload shapes. - Memory + skills — what gets pinned.