Message Compression

This guide explains message compression: what problem it solves, how to enable it, and how to implement custom strategies.

The Problem

In multi-turn execution, the LLM generates programs that build on previous turns:

Turn 1: (def x 1) (println x)
Turn 2: (def x 1) (def y 2) (println (+ x y))     # repeats turn 1
Turn 3: (def x 1) (def y 2) (def z 3) ...         # repeats turn 1+2

Without compression, the message history accumulates full programs from every turn. The LLM sees all previous versions of its evolving program. This:

Wastes tokens - Repeated code inflates context size
Confuses the model - Multiple versions of the same definitions
Reduces cache hits - Dynamic history defeats prompt caching

The Solution

Compression transforms the turn history into a compact format. Instead of showing previous programs, it shows:

What was defined (symbols available)
What actions were taken (tool calls)
What was observed (println output)
Whether execution succeeded or failed

The LLM doesn't need to see its previous code - it needs the results of its previous code.

Enabling Compression

# Enable with default strategy
SubAgent.run(prompt, llm: llm, compression: true)

# Explicit strategy
alias PtcRunner.SubAgent.Compression.SingleUserCoalesced
SubAgent.run(prompt, llm: llm, compression: SingleUserCoalesced)

# With options
SubAgent.run(prompt, llm: llm, compression: {SingleUserCoalesced, println_limit: 10})

SingleUserCoalesced Strategy

The default strategy coalesces all turn history into a single USER message. This prevents the LLM from mimicking summary formats (which could happen if summaries appeared in ASSISTANT messages).

Message Structure

[SYSTEM]  Static: language reference, return/fail usage, output format
[USER]    Dynamic: mission + namespaces + execution history + turns left

What the LLM Sees

Find well-reviewed products in stock

;; === tool/ ===
(tool/search-reviews query)      ; query:string -> string

;; === data/ ===
data/products                    ; list[7], sample: {:name "Laptop", :price 1200}

;; === user/ (your prelude) ===
electronics                      ; = list[4], sample: {:name "Laptop"}

;; Tool calls made:
;   search-reviews("Electronics")

;; Output:
Found 5 matching products

Turns left: 4

Namespace Model

Namespace	Content	Changes?
`tool/`	Available tools with signatures	No (stable)
`data/`	Input context data	No (stable)
`user/`	Accumulated definitions (prelude)	Yes (grows)

The tool/ and data/ sections are stable across turns, enabling prompt caching. Only user/ changes as definitions accumulate.

Error Handling

Errors use conditional collapsing:

Current turn	Error display
Succeeds	All previous errors collapsed (clean view)
Fails	Most recent error shown (helps recovery)

Once the LLM recovers from an error, old mistakes become noise and are hidden.

Configuration Options

Option	Default	Description
`println_limit`	15	Most recent println calls shown
`tool_call_limit`	20	Most recent tool calls shown
`sample_limit`	3	Max items shown in list/map samples
`sample_printable_limit`	80	Max chars for string samples

SubAgent.run(prompt,
  llm: llm,
  compression: {SingleUserCoalesced, println_limit: 10, tool_call_limit: 15}
)

Older entries are dropped (FIFO) but preserved in step.turns for debugging.

Sample Limits

The sample_limit and sample_printable_limit options control how values are displayed in the data/ and user/ namespace sections:

# Default: truncated samples
matches    ; = list[50], sample: [{:file "lib/ptc_runner/..." ...} ...] (showing first 3)

# With higher limits: more context preserved
SubAgent.run(prompt,
  llm: llm,
  compression: {SingleUserCoalesced, sample_limit: 10, sample_printable_limit: 200}
)
# Now shows: [{:file "lib/ptc_runner/lisp/eval.ex" ...} ...] (showing first 10)

Increase these limits when the LLM needs to see more context (e.g., full file paths in grep results).

Debugging Compression

To see what the LLM receives:

{:ok, step} = SubAgent.run(prompt, llm: llm, compression: true)

# Show compressed view
SubAgent.Debug.print_trace(step, view: :compressed)

# Compare with full turn history
SubAgent.Debug.print_trace(step)

Full history is always preserved in step.turns regardless of compression.

Compression Statistics

Use usage: true to see compression metrics:

SubAgent.Debug.print_trace(step, usage: true)

This displays a compression section showing what was dropped:

+- Compression -------------------------------------------+
|   Strategy:     single-user-coalesced
|   Turns:        9 compressed
|   Tool calls:   20/25 shown (5 dropped)
|   Printlns:     15/18 shown (3 dropped)
|   Errors:       2 turn(s) collapsed
+---------------------------------------------------------+

The stats are also available programmatically in step.usage.compression:

step.usage.compression
# => %{
#   enabled: true,
#   strategy: "single-user-coalesced",
#   turns_compressed: 9,
#   tool_calls_total: 25,
#   tool_calls_shown: 20,
#   tool_calls_dropped: 5,
#   printlns_total: 18,
#   printlns_shown: 15,
#   printlns_dropped: 3,
#   error_turns_collapsed: 2
# }

Metric	Description
`turns_compressed`	Number of turns coalesced into single message
`tool_calls_dropped`	Tool calls exceeding `tool_call_limit`
`printlns_dropped`	Println output exceeding `println_limit`
`error_turns_collapsed`	Failed turns hidden from LLM (all if recovered, all but last if still failing)

When to Use Compression

Enable compression when:

Multi-turn agents with many turns (5+)
Agents that make many tool calls
Context size is a concern (cost, latency)
LLM seems confused by seeing old program versions

Skip compression when:

Single-turn agents (max_turns: 1) — compression is automatically skipped even if enabled
Simple agents with few turns
Debugging (easier to see full history)

Compression in Practice

Note: The observations below are based on limited testing. Results will vary depending on task complexity, LLM model, and prompt design.

Observed Trade-offs

In one example comparing a multi-turn agent with compression vs. a single-turn approach:

Aspect	With Compression	Without
Turns	6	1
Tokens	~18k input	~2.4k input
Duration	~35s	~14s
Confidence	92%	60%
Answer quality	Comprehensive	Incomplete

The single-turn approach was faster and cheaper, but produced an inaccurate answer. The compressed multi-turn approach allowed the LLM to:

Recover from errors — When distinct-by failed (undefined), the next turn used distinct instead
Iterate systematically — Read the right files after initial grep results
Build understanding — Each turn refined the investigation based on previous findings

When Compression Helps Most

Based on initial observations, compression appears most beneficial when:

Tasks require exploration — The LLM doesn't know upfront which files or data matter
Errors are likely — Syntax mistakes, undefined functions, or incorrect assumptions
Quality matters more than speed — The extra tokens/time pays off in accuracy

Potential Pitfalls

Without compression (or with single-turn execution), we've observed:

LLMs attempting too much at once in a single massive program
Redundant tool calls when earlier results could have guided the search
Premature returns with low-confidence answers

These patterns suggest that the iterative feedback loop enabled by compression helps the LLM stay on track.

Implementing Custom Strategies

For advanced use cases, you can implement the Compression behaviour:

defmodule MyApp.CustomCompression do
  @behaviour PtcRunner.SubAgent.Compression

  @impl true
  def name, do: "custom"

  @impl true
  def to_messages(turns, memory, opts) do
    # turns: list of %Turn{} structs (immutable history)
    # memory: accumulated user definitions
    # opts: keyword list with :prompt, :system_prompt, :tools, :data, etc.

    system_prompt = Keyword.get(opts, :system_prompt, "")
    mission = Keyword.get(opts, :prompt, "")

    # Build your message array
    [
      %{role: :system, content: system_prompt},
      %{role: :user, content: build_user_content(mission, turns, memory, opts)}
    ]
  end

  defp build_user_content(mission, turns, memory, opts) do
    # Your compression logic here
    # ...
  end
end

Available Data in `opts`

Key	Type	Description
`:prompt`	string	The mission/prompt text
`:system_prompt`	string	Static system prompt
`:tools`	map	Tool name → Tool struct
`:data`	map	Input context data
`:turns_left`	integer	Remaining turns
`:println_limit`	integer	Max println entries
`:tool_call_limit`	integer	Max tool call entries
`:sample_limit`	integer	Max items in list/map samples
`:sample_printable_limit`	integer	Max chars for string samples
`:signature`	string	Output signature (if any)

Turn Struct Fields

Each %Turn{} provides:

Field	Type	Description
`number`	integer	Turn index (1-based)
`program`	string	Extracted PTC-Lisp code
`result`	term	Execution result
`prints`	list	Output from println calls
`tool_calls`	list	Tools invoked with args/results
`memory`	map	State snapshot after turn
`success?`	boolean	Whether turn succeeded

Using Your Strategy

SubAgent.run(prompt,
  llm: llm,
  compression: MyApp.CustomCompression
)

# Or with options
SubAgent.run(prompt,
  llm: llm,
  compression: {MyApp.CustomCompression, my_option: "value"}
)