Testing SubAgents

Strategies for testing SubAgent-based code: mocking LLMs, testing tools, and integration testing.

Prerequisites

Basic familiarity with SubAgents
ExUnit testing knowledge

Overview

SubAgents have three testable layers:

Layer	What to Test	Approach
Tools	Business logic	Unit tests, no LLM needed
Prompts	Template expansion	Snapshot with `preview_prompt/2`
Integration	Full agent behavior	Mock or real LLM (gated)

Test tools extensively, snapshot prompts for regression, use integration tests sparingly.

Mocking the LLM Callback

The LLM callback is a function. Create mocks in a test helper module:

defmodule MyApp.TestHelpers do
  @doc "Mock LLM that returns a fixed PTC-Lisp program"
  def mock_llm(program) do
    fn _input -> {:ok, "```clojure\n#{program}\n```"} end
  end

  @doc "Mock LLM that returns programs in sequence (for multi-turn)"
  def scripted_llm(programs) do
    {:ok, counter} = Agent.start_link(fn -> 0 end)

    fn _input ->
      turn = Agent.get_and_update(counter, fn n -> {n, n + 1} end)
      program = Enum.at(programs, turn, List.last(programs))
      {:ok, "```clojure\n#{program}\n```"}
    end
  end
end

Usage:

test "finds maximum value" do
  mock = TestHelpers.mock_llm("(return {:max 42})")

  {:ok, step} = SubAgent.run(
    "Find the maximum",
    signature: "{max :int}",
    llm: mock
  )

  assert step.return["max"] == 42
end

test "multi-turn agent" do
  mock = TestHelpers.scripted_llm([
    "(call \"search\" {:query \"test\"})",
    "(return {:count (count data/results)})"
  ])

  {:ok, step} = SubAgent.run(
    "Search and count",
    signature: "{count :int}",
    tools: %{"search" => fn _ -> [%{id: 1}, %{id: 2}] end},
    llm: mock
  )

  assert step.return["count"] == 2
end

Testing Tools in Isolation

Tools are regular functions—test them directly without SubAgent:

describe "search/1" do
  test "returns matching items" do
    result = MyApp.Tools.search(%{query: "urgent", limit: 5})

    assert is_list(result)
    assert length(result) <= 5
  end

  test "returns empty list for no matches" do
    assert MyApp.Tools.search(%{query: "nonexistent"}) == []
  end
end

Snapshot Testing with preview_prompt/2

Test prompt generation without calling the LLM:

test "system prompt includes expected sections" do
  agent = SubAgent.new(
    prompt: "Find urgent emails for {{user}}",
    signature: "{count :int}",
    tools: %{"list_emails" => &MyApp.Email.list/1}
  )

  preview = SubAgent.preview_prompt(agent, context: %{user: "alice@example.com"})

  assert preview.system =~ "list_emails"
  assert preview.user =~ "alice@example.com"
end

For regression testing, compare against stored snapshots. See PtcRunner.SubAgent.preview_prompt/2 for details.

Integration Testing

Gate real LLM tests—they're slow and non-deterministic:

defmodule MyApp.SubAgentIntegrationTest do
  use ExUnit.Case

  @moduletag :e2e

  setup do
    case System.get_env("OPENROUTER_API_KEY") do
      nil -> {:ok, skip: true}
      key -> {:ok, llm: MyApp.LLM.openrouter(key)}
    end
  end

  @tag :e2e
  test "email finder returns valid structure", %{llm: llm} do
    {:ok, step} = SubAgent.run(
      "Find the most recent email",
      signature: "{subject :string, from :string}",
      tools: %{"list_emails" => &MyApp.Email.list_mock/1},
      llm: llm
    )

    assert is_binary(step.return["subject"])
  end
end

Run with mix test --include e2e. Use temperature: 0.0 for more deterministic results.

Testing Error Paths

test "returns error when agent calls fail" do
  mock = TestHelpers.mock_llm("(fail {:reason :not_found})")

  {:error, step} = SubAgent.run("Find something", signature: "{id :int}", llm: mock)

  assert step.fail.reason == :not_found
end

test "fails when max_turns exceeded" do
  mock = TestHelpers.mock_llm("(+ 1 1)")  # Never returns

  {:error, step} = SubAgent.run(
    "Loop forever",
    signature: "{result :int}",
    max_turns: 3,
    llm: mock
  )

  assert step.fail.reason == :max_turns_exceeded
end

Other error scenarios follow the same pattern: validation errors (wrong return type), tool errors ({:error, reason}). The agent receives error feedback and can retry or fail gracefully.

Debugging with print_trace

During development and testing, use SubAgent.Debug.print_trace/2 to see exactly what happened:

{:ok, step} = SubAgent.run(agent, llm: llm)

# Show a compact view of the execution
SubAgent.Debug.print_trace(step)

# Include raw LLM output (reasoning/commentary)
SubAgent.Debug.print_trace(step, raw: true)

# Show full LLM messages including the system prompt
SubAgent.Debug.print_trace(step, messages: true)

With messages: true, you'll see the System Prompt, the raw LLM output, and the feedback sent to the LLM. This is essential for debugging prompt issues or tool definition errors.