Compaction pipeline (v0.4)

Copy Markdown View Source

When the conversation's estimated tokens crosses compact_at (default 60% of the provider's max_tokens), the loop runs the compaction pipeline before the next iteration.

The default pipeline mirrors Claude Code's five-layer design from the paper: cheap deterministic stages run first, the expensive LLM-summary stage runs only when those couldn't get the conversation under target.

state.messages
  
[BudgetReduction]  drop oversized tool-result bodies
  
[Snip]             drop stale tool-result bodies (already integrated)
  
[Microcompact]     collapse runs of 3+ adjacent same-tool results
  
[ContextCollapse]  non-destructive view-time projection
  
[Summary]          LLM summary of the middle of history
  
state.messages (or state.meta[:compact_view])

Each stage is a module implementing ExAthena.Compactor.Stage with compact_stage/2 and name/0. The pipeline orchestrator (ExAthena.Compactor.Pipeline) walks the list with Enum.reduce_while and short-circuits once estimated tokens fall below target. Every stage runs inside its own [:ex_athena, :compaction, <:stage_name>, :start | :stop] telemetry span.

Built-in stages

Compactors.BudgetReduction

Cheap, deterministic. For each tool-result message whose content exceeds :per_tool_result_max_chars (default 16k), replace the body with [truncated; full=N chars; ref=<id>] and persist the original to state.meta[:tool_result_archive] keyed by ref.

This single stage often gets the conversation under target on its own when one outlier tool produced a giant response (a Read of a 100KB file, a Bash grep over a huge tree).

Compactors.Snip

Cheap, deterministic. Drops stale tool-result bodies older than :snip_age_iterations turns (default 4) whose paired assistant turn already happened. Each is replaced with a <snipped: stale tool-result for call <id>> marker — pairing-by-id stays intact.

Memory + skill messages and the live suffix are never touched.

Compactors.Microcompact

Medium cost, deterministic. Runs of 3+ adjacent tool-result messages collapse into a single elided summary tagged name: "microcompact". The summary lists each call's id + first 200 chars of its result.

Compactors.ContextCollapse

Medium cost, non-destructive. Builds a projected message list for the next request only — the authoritative state.messages is never mutated. The projection collapses two patterns:

  • Superseded reads: a Read whose file was later edited collapses to a <read superseded by later edit: <path>> stub.
  • Repeated identical tool calls: same tool name + same args consecutive (skip-tool-results-aware) get a (repeat) marker on the follow-up call.

Stored at state.meta[:compact_view]. Resume / replay / rewind read the original state.messages, so they stay correct.

Compactors.Summary

Expensive — runs an LLM call. The middle of the message list is summarised into a single assistant message tagged name: "compactor_summary". Cost counts against the run's budget. Only fires when the deterministic stages couldn't get the conversation under target.

Reactive recovery

When a mode returns {:error, :error_prompt_too_long} (e.g. the provider explicitly said "context too long"), the loop runs the pipeline with force: true — every stage attempts compaction unconditionally, regardless of estimated tokens. The same iteration is then retried once. If still over (or compaction itself errors), the run terminates with a typed :error_prompt_too_long capacity termination (PR0 finish-reason).

Gated by :reactive_compact (default true):

ExAthena.run("explore the repo", reactive_compact: false)  # opt out

Configuration

Per-call

ExAthena.run("…",
  compact_at: 0.5,                  # trigger at 50% instead of 60%
  per_tool_result_max_chars: 8_000,
  snip_age_iterations: 8,
  microcompact_run_threshold: 5,
  compaction_pipeline: [
    ExAthena.Compactors.BudgetReduction,
    ExAthena.Compactors.Summary    # skip the others
  ])

Global

config :ex_athena, :compactor,
  compact_at: 0.6,
  pinned_prefix_count: 1,
  live_suffix_count: 6,
  per_tool_result_max_chars: 16_000,
  snip_age_iterations: 4

Custom stages

Implement ExAthena.Compactor.Stage:

defmodule MyApp.Compactors.DropImages do
  @behaviour ExAthena.Compactor.Stage

  @impl true
  def name, do: :drop_images

  @impl true
  def compact_stage(%ExAthena.Loop.State{} = state, estimate) do
    new_messages =
      Enum.map(state.messages, fn
        %{role: :user, content: content} = msg when is_binary(content) ->
          if String.contains?(content, "<image>"),
            do: %{msg | content: "<image elided>"},
            else: msg

        msg ->
          msg
      end)

    if new_messages == state.messages do
      :skip
    else
      {:ok,
       %{state | messages: new_messages},
       %{estimate | tokens: ExAthena.Compactor.estimate_tokens(new_messages)}}
    end
  end
end

ExAthena.run("...",
  compaction_pipeline: [
    MyApp.Compactors.DropImages,
    ExAthena.Compactors.BudgetReduction,
    ExAthena.Compactors.Summary
  ])

Stages should be idempotent — the reactive-recovery path may run the pipeline a second time with force: true. Returning :skip on a second pass is the canonical way to be idempotent.

Hooks

  • :PreCompact — fires before the pipeline runs at all.
  • :PreCompactStage — fires before each individual stage. Payload: %{stage: atom(), estimate: %{tokens, max_tokens}}.
  • :PostCompact — fires after a successful compaction. Payload: %{metadata: %{before, after, dropped_count, stages_applied, reason}}.

Pinning rules

The compactor never drops:

  • Messages in the pinned prefix:pinned_prefix_count slots (default 1) at the front, plus any memory + pre-loaded skill messages (PR1).
  • Messages in the live suffix:live_suffix_count slots (default 6) at the tail.

Memory + skill messages tagged name: "memory" / name: "skill:<name>" are pinned by default. Hosts can pin custom messages by extending pinned_prefix_count.

See also