Message Compression

View Source

This guide explains message compression: what problem it solves, how to enable it, and how to implement custom strategies.

The Problem

In multi-turn execution, the LLM generates programs that build on previous turns:

Turn 1: (def x 1) (println x)
Turn 2: (def x 1) (def y 2) (println (+ x y))     # repeats turn 1
Turn 3: (def x 1) (def y 2) (def z 3) ...         # repeats turn 1+2

Without compression, the message history accumulates full programs from every turn. The LLM sees all previous versions of its evolving program. This:

  • Wastes tokens - Repeated code inflates context size
  • Confuses the model - Multiple versions of the same definitions
  • Reduces cache hits - Dynamic history defeats prompt caching

The Solution

Compression transforms the turn history into a compact format. Instead of showing previous programs, it shows:

  • What was defined (symbols available)
  • What actions were taken (tool calls)
  • What was observed (println output)
  • Whether execution succeeded or failed

The LLM doesn't need to see its previous code - it needs the results of its previous code.

Enabling Compression

# Enable with default strategy
SubAgent.run(prompt, llm: llm, compression: true)

# Explicit strategy
alias PtcRunner.SubAgent.Compression.SingleUserCoalesced
SubAgent.run(prompt, llm: llm, compression: SingleUserCoalesced)

# With options
SubAgent.run(prompt, llm: llm, compression: {SingleUserCoalesced, println_limit: 10})

SingleUserCoalesced Strategy

The default strategy coalesces all turn history into a single USER message. This prevents the LLM from mimicking summary formats (which could happen if summaries appeared in ASSISTANT messages).

Message Structure

[SYSTEM]  Static: language reference, return/fail usage, output format
[USER]    Dynamic: mission + namespaces + execution history + turns left

What the LLM Sees

Find well-reviewed products in stock

;; === tool/ ===
(tool/search-reviews query)      ; query:string -> string

;; === data/ ===
data/products                    ; list[7], sample: {:name "Laptop", :price 1200}

;; === user/ (your prelude) ===
electronics                      ; = list[4], sample: {:name "Laptop"}

;; Tool calls made:
;   search-reviews("Electronics")

;; Output:
Found 5 matching products

Turns left: 4

Namespace Model

NamespaceContentChanges?
tool/Available tools with signaturesNo (stable)
data/Input context dataNo (stable)
user/Accumulated definitions (prelude)Yes (grows)

The tool/ and data/ sections are stable across turns, enabling prompt caching. Only user/ changes as definitions accumulate.

Error Handling

Errors use conditional collapsing:

Current turnError display
SucceedsAll previous errors collapsed (clean view)
FailsMost recent error shown (helps recovery)

Once the LLM recovers from an error, old mistakes become noise and are hidden.

Configuration Options

OptionDefaultDescription
println_limit15Most recent println calls shown
tool_call_limit20Most recent tool calls shown
sample_limit3Max items shown in list/map samples
sample_printable_limit80Max chars for string samples
SubAgent.run(prompt,
  llm: llm,
  compression: {SingleUserCoalesced, println_limit: 10, tool_call_limit: 15}
)

Older entries are dropped (FIFO) but preserved in step.turns for debugging.

Sample Limits

The sample_limit and sample_printable_limit options control how values are displayed in the data/ and user/ namespace sections:

# Default: truncated samples
matches    ; = list[50], sample: [{:file "lib/ptc_runner/..." ...} ...] (showing first 3)

# With higher limits: more context preserved
SubAgent.run(prompt,
  llm: llm,
  compression: {SingleUserCoalesced, sample_limit: 10, sample_printable_limit: 200}
)
# Now shows: [{:file "lib/ptc_runner/lisp/eval.ex" ...} ...] (showing first 10)

Increase these limits when the LLM needs to see more context (e.g., full file paths in grep results).

Debugging Compression

To see what the LLM receives:

{:ok, step} = SubAgent.run(prompt, llm: llm, compression: true)

# Show compressed view
SubAgent.Debug.print_trace(step, view: :compressed)

# Compare with full turn history
SubAgent.Debug.print_trace(step)

Full history is always preserved in step.turns regardless of compression.

Compression Statistics

Use usage: true to see compression metrics:

SubAgent.Debug.print_trace(step, usage: true)

This displays a compression section showing what was dropped:

+- Compression -------------------------------------------+
|   Strategy:     single-user-coalesced
|   Turns:        9 compressed
|   Tool calls:   20/25 shown (5 dropped)
|   Printlns:     15/18 shown (3 dropped)
|   Errors:       2 turn(s) collapsed
+---------------------------------------------------------+

The stats are also available programmatically in step.usage.compression:

step.usage.compression
# => %{
#   enabled: true,
#   strategy: "single-user-coalesced",
#   turns_compressed: 9,
#   tool_calls_total: 25,
#   tool_calls_shown: 20,
#   tool_calls_dropped: 5,
#   printlns_total: 18,
#   printlns_shown: 15,
#   printlns_dropped: 3,
#   error_turns_collapsed: 2
# }
MetricDescription
turns_compressedNumber of turns coalesced into single message
tool_calls_droppedTool calls exceeding tool_call_limit
printlns_droppedPrintln output exceeding println_limit
error_turns_collapsedFailed turns hidden from LLM (all if recovered, all but last if still failing)

When to Use Compression

Enable compression when:

  • Multi-turn agents with many turns (5+)
  • Agents that make many tool calls
  • Context size is a concern (cost, latency)
  • LLM seems confused by seeing old program versions

Skip compression when:

  • Single-turn agents (max_turns: 1) — compression is automatically skipped even if enabled
  • Simple agents with few turns
  • Debugging (easier to see full history)

Compression in Practice

Note: The observations below are based on limited testing. Results will vary depending on task complexity, LLM model, and prompt design.

Observed Trade-offs

In one example comparing a multi-turn agent with compression vs. a single-turn approach:

AspectWith CompressionWithout
Turns61
Tokens~18k input~2.4k input
Duration~35s~14s
Confidence92%60%
Answer qualityComprehensiveIncomplete

The single-turn approach was faster and cheaper, but produced an inaccurate answer. The compressed multi-turn approach allowed the LLM to:

  1. Recover from errors — When distinct-by failed (undefined), the next turn used distinct instead
  2. Iterate systematically — Read the right files after initial grep results
  3. Build understanding — Each turn refined the investigation based on previous findings

When Compression Helps Most

Based on initial observations, compression appears most beneficial when:

  • Tasks require exploration — The LLM doesn't know upfront which files or data matter
  • Errors are likely — Syntax mistakes, undefined functions, or incorrect assumptions
  • Quality matters more than speed — The extra tokens/time pays off in accuracy

Potential Pitfalls

Without compression (or with single-turn execution), we've observed:

  • LLMs attempting too much at once in a single massive program
  • Redundant tool calls when earlier results could have guided the search
  • Premature returns with low-confidence answers

These patterns suggest that the iterative feedback loop enabled by compression helps the LLM stay on track.

Implementing Custom Strategies

For advanced use cases, you can implement the Compression behaviour:

defmodule MyApp.CustomCompression do
  @behaviour PtcRunner.SubAgent.Compression

  @impl true
  def name, do: "custom"

  @impl true
  def to_messages(turns, memory, opts) do
    # turns: list of %Turn{} structs (immutable history)
    # memory: accumulated user definitions
    # opts: keyword list with :prompt, :system_prompt, :tools, :data, etc.

    system_prompt = Keyword.get(opts, :system_prompt, "")
    mission = Keyword.get(opts, :prompt, "")

    # Build your message array
    [
      %{role: :system, content: system_prompt},
      %{role: :user, content: build_user_content(mission, turns, memory, opts)}
    ]
  end

  defp build_user_content(mission, turns, memory, opts) do
    # Your compression logic here
    # ...
  end
end

Available Data in opts

KeyTypeDescription
:promptstringThe mission/prompt text
:system_promptstringStatic system prompt
:toolsmapTool name → Tool struct
:datamapInput context data
:turns_leftintegerRemaining turns
:println_limitintegerMax println entries
:tool_call_limitintegerMax tool call entries
:sample_limitintegerMax items in list/map samples
:sample_printable_limitintegerMax chars for string samples
:signaturestringOutput signature (if any)

Turn Struct Fields

Each %Turn{} provides:

FieldTypeDescription
numberintegerTurn index (1-based)
programstringExtracted PTC-Lisp code
resulttermExecution result
printslistOutput from println calls
tool_callslistTools invoked with args/results
memorymapState snapshot after turn
success?booleanWhether turn succeeded

Using Your Strategy

SubAgent.run(prompt,
  llm: llm,
  compression: MyApp.CustomCompression
)

# Or with options
SubAgent.run(prompt,
  llm: llm,
  compression: {MyApp.CustomCompression, my_option: "value"}
)

See Also