Testing SubAgents
View SourceStrategies for testing SubAgent-based code: mocking LLMs, testing tools, and integration testing.
Prerequisites
- Basic familiarity with SubAgents
- ExUnit testing knowledge
Overview
SubAgents have three testable layers:
| Layer | What to Test | Approach |
|---|---|---|
| Tools | Business logic | Unit tests, no LLM needed |
| Prompts | Template expansion | Snapshot with preview_prompt/2 |
| Integration | Full agent behavior | Mock or real LLM (gated) |
Test tools extensively, snapshot prompts for regression, use integration tests sparingly.
Mocking the LLM Callback
The LLM callback is a function. Create mocks in a test helper module:
defmodule MyApp.TestHelpers do
@doc "Mock LLM that returns a fixed PTC-Lisp program"
def mock_llm(program) do
fn _input -> {:ok, "```clojure\n#{program}\n```"} end
end
@doc "Mock LLM that returns programs in sequence (for multi-turn)"
def scripted_llm(programs) do
{:ok, counter} = Agent.start_link(fn -> 0 end)
fn _input ->
turn = Agent.get_and_update(counter, fn n -> {n, n + 1} end)
program = Enum.at(programs, turn, List.last(programs))
{:ok, "```clojure\n#{program}\n```"}
end
end
endUsage:
test "finds maximum value" do
mock = TestHelpers.mock_llm("(return {:max 42})")
{:ok, step} = SubAgent.run(
"Find the maximum",
signature: "{max :int}",
llm: mock
)
assert step.return.max == 42
end
test "multi-turn agent" do
mock = TestHelpers.scripted_llm([
"(call \"search\" {:query \"test\"})",
"(return {:count (count data/results)})"
])
{:ok, step} = SubAgent.run(
"Search and count",
signature: "{count :int}",
tools: %{"search" => fn _ -> [%{id: 1}, %{id: 2}] end},
llm: mock
)
assert step.return.count == 2
endTesting Tools in Isolation
Tools are regular functions—test them directly without SubAgent:
describe "search/1" do
test "returns matching items" do
result = MyApp.Tools.search(%{query: "urgent", limit: 5})
assert is_list(result)
assert length(result) <= 5
end
test "returns empty list for no matches" do
assert MyApp.Tools.search(%{query: "nonexistent"}) == []
end
endSnapshot Testing with preview_prompt/2
Test prompt generation without calling the LLM:
test "system prompt includes expected sections" do
agent = SubAgent.new(
prompt: "Find urgent emails for {{user}}",
signature: "{count :int}",
tools: %{"list_emails" => &MyApp.Email.list/1}
)
preview = SubAgent.preview_prompt(agent, context: %{user: "alice@example.com"})
assert preview.system =~ "list_emails"
assert preview.user =~ "alice@example.com"
endFor regression testing, compare against stored snapshots. See PtcRunner.SubAgent.preview_prompt/2 for details.
Integration Testing
Gate real LLM tests—they're slow and non-deterministic:
defmodule MyApp.SubAgentIntegrationTest do
use ExUnit.Case
@moduletag :e2e
setup do
case System.get_env("OPENROUTER_API_KEY") do
nil -> {:ok, skip: true}
key -> {:ok, llm: MyApp.LLM.openrouter(key)}
end
end
@tag :e2e
test "email finder returns valid structure", %{llm: llm} do
{:ok, step} = SubAgent.run(
"Find the most recent email",
signature: "{subject :string, from :string}",
tools: %{"list_emails" => &MyApp.Email.list_mock/1},
llm: llm
)
assert is_binary(step.return.subject)
end
endRun with mix test --include e2e. Use temperature: 0.0 for more deterministic results.
Testing Error Paths
test "returns error when agent calls fail" do
mock = TestHelpers.mock_llm("(fail {:reason :not_found})")
{:error, step} = SubAgent.run("Find something", signature: "{id :int}", llm: mock)
assert step.fail.reason == :not_found
end
test "fails when max_turns exceeded" do
mock = TestHelpers.mock_llm("(+ 1 1)") # Never returns
{:error, step} = SubAgent.run(
"Loop forever",
signature: "{result :int}",
max_turns: 3,
llm: mock
)
assert step.fail.reason == :max_turns_exceeded
endOther error scenarios follow the same pattern: validation errors (wrong return type), tool errors ({:error, reason}). The agent receives error feedback and can retry or fail gracefully.
Debugging with print_trace
During development and testing, use SubAgent.Debug.print_trace/2 to see exactly what happened:
{:ok, step} = SubAgent.run(agent, llm: llm)
# Show a compact view of the execution
SubAgent.Debug.print_trace(step)
# Include raw LLM output (reasoning/commentary)
SubAgent.Debug.print_trace(step, raw: true)
# Show full LLM messages including the system prompt
SubAgent.Debug.print_trace(step, messages: true)With messages: true, you'll see the System Prompt, the raw LLM output, and the feedback sent to the LLM. This is essential for debugging prompt issues or tool definition errors.
See Also
- Getting Started - Build your first SubAgent
- Observability - Telemetry, debug mode, and tracing
PtcRunner.SubAgent- API reference (all options)PtcRunner.Step- Result struct reference