Testing Guide for Claude Agent SDK

This guide provides comprehensive documentation for testing applications built with the Claude Agent SDK for Elixir. It covers the mock system, test fixtures, hooks testing, permission callbacks, integration testing patterns, and best practices.

Mock System Overview
Enabling Mock Mode
Setting Mock Responses
Mock Response Format
Testing Hooks
Testing Permission Callbacks
Testing with SupertesterCase
Integration Testing Patterns
Environment Configuration for Tests
Best Practices

Mock System Overview

The Claude Agent SDK includes a comprehensive mocking system that enables testing without making actual API calls to the Claude service. The mock system consists of three main components:

Components

Mock Server (ClaudeAgentSDK.Mock): A GenServer that stores and retrieves mock responses based on prompt patterns.
Mock Process (ClaudeAgentSDK.Mock.Process): Intercepts CLI calls when mock mode is enabled and returns stored responses.
Mock Transport (ClaudeAgentSDK.TestSupport.MockTransport): A test transport that records outbound messages and allows pushing inbound frames for testing the control protocol client.

Benefits

Fast Tests: No network calls, instant responses
Predictable: Deterministic responses for consistent test results
No API Costs: Zero usage charges during testing
CI/CD Friendly: No authentication required
Offline Development: Work without internet connectivity

Enabling Mock Mode

There are multiple ways to enable mock mode depending on your use case.

Configuration-Based (Recommended for Tests)

Configure mock mode in your config/test.exs:

# config/test.exs
config :claude_agent_sdk,
  use_mock: true

Runtime Toggle

Enable or disable mocking at runtime:

# Enable mocking
Application.put_env(:claude_agent_sdk, :use_mock, true)

# Start the mock server (required)
{:ok, _pid} = ClaudeAgentSDK.Mock.start_link()

# Disable mocking
Application.put_env(:claude_agent_sdk, :use_mock, false)

Test Helper Setup

The SDK's test_helper.exs automatically configures mock mode:

# test/test_helper.exs
live_tests? = System.get_env("LIVE_TESTS") == "true"

# Enable mocking for tests unless running in live mode
Application.put_env(:claude_agent_sdk, :use_mock, !live_tests?)

# Start the mock server only when mock mode is enabled
unless live_tests? do
  {:ok, _} = ClaudeAgentSDK.Mock.start_link()
end

ExUnit.start(exclude: [:integration, :live], capture_log: true)

Verifying Mock Mode

# Check if mock mode is enabled
Application.get_env(:claude_agent_sdk, :use_mock)
# => true or false

# Verify mock server is running
Process.whereis(ClaudeAgentSDK.Mock)
# => PID or nil

Setting Mock Responses

Basic Pattern Matching

Set responses that match prompts containing a specific pattern:

alias ClaudeAgentSDK.Mock

# Set a response for prompts containing "hello"
Mock.set_response("hello", [
  %{
    "type" => "assistant",
    "message" => %{"content" => "Hello from mock!"},
    "session_id" => "mock-123"
  }
])

# Any query containing "hello" returns this response
ClaudeAgentSDK.query("say hello") |> Enum.to_list()

Setting Default Responses

Configure a fallback response for unmatched prompts:

Mock.set_default_response([
  %{
    "type" => "assistant",
    "message" => %{"content" => "Custom default response"}
  }
])

Clearing Responses

Reset all mock responses between tests:

Mock.clear_responses()

Getting Responses Programmatically

Retrieve the response that would be returned for a prompt:

response = Mock.get_response("test prompt")

Mock Response Format

Mock responses should match the Claude CLI JSON format. A complete response typically includes three message types.

Complete Response Structure

[
  # 1. System initialization message
  %{
    "type" => "system",
    "subtype" => "init",
    "session_id" => "mock-session-123",
    "model" => "claude-opus-4-6",
    "tools" => ["bash", "editor"],
    "cwd" => "/current/dir",
    "permissionMode" => "default",
    "apiKeySource" => "mock"
  },

  # 2. Assistant response
  %{
    "type" => "assistant",
    "message" => %{
      "role" => "assistant",
      "content" => "Your response here"
    },
    "session_id" => "mock-session-123"
  },

  # 3. Result message
  %{
    "type" => "result",
    "subtype" => "success",
    "session_id" => "mock-session-123",
    "total_cost_usd" => 0.001,
    "duration_ms" => 100,
    "duration_api_ms" => 50,
    "num_turns" => 1,
    "is_error" => false
  }
]

Message Types Reference

Type	Description	Key Fields
`system`	Session initialization	`session_id`, `model`, `cwd`, `tools`
`user`	User input	`message`, `session_id`
`assistant`	Claude response	`message`, `session_id`
`result`	Final result	`total_cost_usd`, `duration_ms`, `num_turns`

Result Subtypes

success - Successful completion
error_max_turns - Max turns limit reached
error_during_execution - Error during execution

Tool Use in Responses

Mock responses can include tool use:

%{
  "type" => "assistant",
  "message" => %{
    "role" => "assistant",
    "content" => [
      %{
        "type" => "tool_use",
        "id" => "tool_123",
        "name" => "Bash",
        "input" => %{"command" => "ls -la"}
      }
    ]
  }
}

Testing Hooks

The hooks system provides lifecycle event interception for testing security policies, logging, and custom behavior.

Hook Events

All 12 hook events from the Python SDK are supported:

Event	When Triggered	Use Case
`pre_tool_use`	Before tool executes	Security validation, logging
`post_tool_use`	After tool executes	Audit trails, monitoring
`post_tool_use_failure`	After tool execution fails	Error tracking
`user_prompt_submit`	When user submits prompt	Context injection
`stop`	When agent finishes	Final logging
`subagent_start`	When subagent is spawned	Subagent tracking
`subagent_stop`	When subagent finishes	Subagent result handling
`pre_compact`	Before context compaction	Preserve context
`notification`	CLI notification received	Alert routing
`permission_request`	Permission dialog triggered	Programmatic responses
`session_start`	Session begins	Initialization
`session_end`	Session ends	Cleanup

Creating Test Hooks

alias ClaudeAgentSDK.Hooks.{Matcher, Output}

# Hook that allows all tools
allow_all_hook = fn _input, _tool_use_id, _context ->
  Output.allow()
end

# Hook that denies specific tools
deny_dangerous = fn input, _tool_use_id, _context ->
  case input do
    %{"tool_name" => "Bash", "tool_input" => %{"command" => cmd}} ->
      if String.contains?(cmd, "rm -rf") do
        Output.deny("Dangerous command blocked")
      else
        Output.allow()
      end
    _ -> Output.allow()
  end
end

# Hook that records invocations for testing
test_pid = self()
recording_hook = fn input, tool_use_id, context ->
  send(test_pid, {:hook_called, input["tool_name"], input, tool_use_id})
  Output.allow()
end

Configuring Hooks in Options

alias ClaudeAgentSDK.{Options, Hooks.Matcher}

options = %Options{
  hooks: %{
    pre_tool_use: [
      Matcher.new("*", [&audit_log/3]),           # Match all tools
      Matcher.new("Bash", [&security_check/3])    # Match specific tool
    ],
    post_tool_use: [
      Matcher.new("*", [&record_result/3])
    ]
  }
}

Testing Hook Behavior

defmodule MyApp.HooksTest do
  use ClaudeAgentSDK.SupertesterCase

  alias ClaudeAgentSDK.{Mock, Options}
  alias ClaudeAgentSDK.Hooks.{Matcher, Output}

  setup do
    Mock.clear_responses()
    :ok
  end

  test "pre_tool_use hook can deny dangerous commands" do
    test_pid = self()

    deny_dangerous = fn input, _tool_use_id, _context ->
      send(test_pid, {:hook_check, input})

      case input do
        %{"tool_name" => "Bash", "tool_input" => %{"command" => cmd}} ->
          if String.contains?(cmd, "rm -rf") do
            Output.deny("Dangerous command blocked")
          else
            Output.allow()
          end
        _ -> Output.allow()
      end
    end

    options = %Options{
      hooks: %{
        pre_tool_use: [Matcher.new("Bash", [deny_dangerous])]
      }
    }

    # Set up mock response with tool use
    Mock.set_response("delete files", [
      %{
        "type" => "assistant",
        "message" => %{
          "content" => [
            %{
              "type" => "tool_use",
              "name" => "Bash",
              "input" => %{"command" => "rm -rf /"}
            }
          ]
        }
      }
    ])

    _messages = ClaudeAgentSDK.query("delete files", options) |> Enum.to_list()

    # Verify hook was called
    assert_receive {:hook_check, %{"tool_name" => "Bash"}}
  end

  test "recording hook tracks all tool invocations" do
    test_pid = self()
    invocations = :ets.new(:invocations, [:set, :public])

    recording_hook = fn input, tool_use_id, _context ->
      :ets.insert(invocations, {tool_use_id, input["tool_name"]})
      send(test_pid, {:tool_used, input["tool_name"]})
      Output.allow()
    end

    options = %Options{
      hooks: %{
        pre_tool_use: [Matcher.new("*", [recording_hook])]
      }
    }

    # Execute query and verify recording
    _messages = ClaudeAgentSDK.query("list files", options) |> Enum.to_list()

    # Check recorded tools
    recorded = :ets.tab2list(invocations)
    assert length(recorded) > 0
  end
end

Hook Output Types

alias ClaudeAgentSDK.Hooks.Output

# Allow execution
Output.allow()
Output.allow("Approved")

# Allow with input modification
Output.allow(updated_input: %{...})

# Deny execution
Output.deny("Reason")
Output.deny("Reason", interrupt: true)  # Stop entire session

# Add context to prompt
Output.add_context("UserPromptSubmit", "Additional context here")

# Add system message
output = Output.allow() |> Output.with_system_message("Logged")

Testing Permission Callbacks

Permission callbacks provide fine-grained control over tool execution.

Permission Context

alias ClaudeAgentSDK.Permission.Context

# Context struct contains:
%Context{
  tool_name: "Write",
  tool_input: %{"file_path" => "/etc/hosts", "content" => "..."},
  session_id: "session-123",
  cwd: "/project"
}

Permission Results

alias ClaudeAgentSDK.Permission.Result

# Allow
Result.allow()

# Allow with modified input
Result.allow(updated_input: %{"file_path" => "/safe/path"})

# Deny
Result.deny("Reason")

# Deny with interrupt (stops session)
Result.deny("Critical error", interrupt: true)

# Allow with permission updates
Result.allow(updated_permissions: [
  %{"type" => "setMode", "mode" => "plan", "destination" => "session"}
])

Creating Test Permission Callbacks

alias ClaudeAgentSDK.Permission.Result

# Allow all
allow_all = fn _context -> Result.allow() end

# Deny all
deny_all = fn _context -> Result.deny("All denied") end

# Recording callback for testing
test_pid = self()
recording_callback = fn context ->
  send(test_pid, {:permission_check, context})
  Result.allow()
end

# Path restriction callback
path_restriction = fn context ->
  allowed_dirs = ["/tmp", "/home/user/project"]

  if context.tool_name in ["Read", "Write"] do
    file_path = context.tool_input["file_path"] || ""

    if Enum.any?(allowed_dirs, &String.starts_with?(file_path, &1)) do
      Result.allow()
    else
      Result.deny("File path outside allowed directories")
    end
  else
    Result.allow()
  end
end

Testing Permission Callbacks

defmodule MyApp.PermissionsTest do
  use ClaudeAgentSDK.SupertesterCase

  alias ClaudeAgentSDK.{Mock, Options}
  alias ClaudeAgentSDK.Permission.{Context, Result}

  setup do
    Mock.clear_responses()
    :ok
  end

  test "permission callback can redirect file writes" do
    redirect_callback = fn context ->
      if context.tool_name == "Write" do
        file_path = context.tool_input["file_path"]

        if String.starts_with?(file_path, "/etc/") do
          safe_path = "/tmp/safe/" <> Path.basename(file_path)
          Result.allow(
            updated_input: Map.put(context.tool_input, "file_path", safe_path)
          )
        else
          Result.allow()
        end
      else
        Result.allow()
      end
    end

    options = %Options{
      can_use_tool: redirect_callback,
      permission_mode: :default
    }

    # Test that writes to /etc/ are redirected
    # ...
  end

  test "permission callback receives correct context" do
    test_pid = self()

    recording_callback = fn context ->
      send(test_pid, {:context_received, context})
      Result.allow()
    end

    options = %Options{
      can_use_tool: recording_callback,
      permission_mode: :default
    }

    # can_use_tool with query uses the control client (string or streaming prompts)
    prompts = [
      %{"type" => "user", "message" => %{"role" => "user", "content" => "write a file"}}
    ]

    _messages = ClaudeAgentSDK.query(prompts, options) |> Enum.to_list()

    assert_receive {:context_received, context}
    assert is_binary(context.tool_name)
    assert is_map(context.tool_input)
  end
end

Permission Modes

%Options{
  permission_mode: :default          # CLI default permission flow
}

%Options{
  permission_mode: :delegate         # Delegate decisions to SDK callback
}

%Options{
  permission_mode: :accept_edits     # Edit operations auto-allowed
}

%Options{
  permission_mode: :plan             # Show plan, execute after approval
}

%Options{
  permission_mode: :bypass_permissions  # All tools allowed (use with caution)
}

%Options{
  permission_mode: :dont_ask         # No permission prompts
}

Testing with SupertesterCase

The SDK provides SupertesterCase for advanced OTP testing with deterministic behavior.

Basic Usage

defmodule MyApp.SDKTest do
  use ClaudeAgentSDK.SupertesterCase

  # Imports available:
  # - Supertester.OTPHelpers
  # - Supertester.GenServerHelpers
  # - Supertester.Assertions
  # - Supertester.SupervisorHelpers
  # - Supertester.PerformanceHelpers
  # - Supertester.ChaosHelpers

  test "basic SDK operation" do
    # Your test code
  end
end

Eventually Helper

For async assertions that require polling:

test "eventually receives expected message" do
  # Start some async operation
  spawn(fn ->
    Process.sleep(50)
    send(self(), {:completed, :value})
  end)

  # Wait for the result
  result = SupertesterCase.eventually(fn ->
    receive do
      {:completed, value} -> value
    after
      0 -> nil
    end
  end, timeout: 1_000, interval: 25)

  assert result == :value
end

Using Mock Transport

For testing the control protocol client:

alias ClaudeAgentSDK.Client
alias ClaudeAgentSDK.TestSupport.MockTransport

test "client handles messages correctly" do
  test_pid = self()

  # Start client with mock transport
  {:ok, client} = Client.start_link(%Options{},
    transport: MockTransport,
    transport_opts: [test_pid: test_pid]
  )

  # Receive transport start notification
  assert_receive {:mock_transport_started, transport_pid}

  # Subscribe to get messages
  assert_receive {:mock_transport_subscribed, _pid}

  # Push a message to the client
  response = %{
    "type" => "assistant",
    "message" => %{"content" => "Hello"}
  }
  MockTransport.push_message(transport_pid, Jason.encode!(response))

  # Verify message was processed
  # ...

  # Get recorded outbound messages
  messages = MockTransport.recorded_messages(transport_pid)
  assert length(messages) > 0

  Client.stop(client)
end

Integration Testing Patterns

Live Smoke Test Pattern

For end-to-end testing with real API:

defmodule MyApp.LiveSmokeTest do
  use ClaudeAgentSDK.SupertesterCase

  @moduletag :live  # Tag for live tests

  test "basic query works end-to-end" do
    messages = ClaudeAgentSDK.query(
      "Say exactly: live smoke ok",
      %Options{max_turns: 1, output_format: :stream_json}
    ) |> Enum.to_list()

    assert Enum.any?(messages, &(&1.type == :assistant))
    assert Enum.any?(messages, &match?(%{type: :result, subtype: :success}, &1))
  end
end

Running Integration Tests

# Run all tests (mocks enabled)
mix test

# Run only live tests against real API
LIVE_TESTS=true mix test --only live

# Run integration tests
mix test --only integration

# Run with verbose output
mix test --trace

Testing SDK MCP Tools

defmodule MyApp.MCPToolsTest do
  use ClaudeAgentSDK.SupertesterCase

  alias ClaudeAgentSDK.{Mock, Options}
  alias ClaudeAgentSDK.Tool.Registry

  setup do
    Mock.clear_responses()
    :ok
  end

  test "SDK MCP server executes tools correctly" do
    # Create server with test tools
    server = ClaudeAgentSDK.create_sdk_mcp_server(
      name: "calc",
      version: "1.0.0",
      tools: [
        ClaudeAgentSDK.TestSupport.CalculatorTools.Add,
        ClaudeAgentSDK.TestSupport.CalculatorTools.GreetUser
      ]
    )

    assert server.type == :sdk
    assert server.name == "calc"
    assert is_pid(server.registry_pid)

    # Execute tool directly
    {:ok, result} = Registry.execute_tool(
      server.registry_pid,
      "add",
      %{"a" => 5, "b" => 3}
    )

    assert result["content"] |> hd() |> Map.get("text") =~ "8"
  end

  test "SDK MCP tools work in query" do
    server = ClaudeAgentSDK.create_sdk_mcp_server(
      name: "math",
      version: "1.0.0",
      tools: [ClaudeAgentSDK.TestSupport.CalculatorTools.Add]
    )

    options = %Options{
      mcp_servers: %{"math" => server},
      allowed_tools: ["mcp__math__add"]
    }

    # Set mock to trigger tool use
    Mock.set_response("calculate", [
      %{
        "type" => "assistant",
        "message" => %{
          "content" => [
            %{
              "type" => "tool_use",
              "name" => "mcp__math__add",
              "input" => %{"a" => 10, "b" => 20}
            }
          ]
        }
      },
      %{
        "type" => "result",
        "subtype" => "success"
      }
    ])

    messages = ClaudeAgentSDK.query("calculate 10 + 20", options) |> Enum.to_list()
    assert Enum.any?(messages, &(&1.type == :result))
  end
end

Testing Async MCP Dispatch

Long-running SDK MCP tool calls should not block the Client process mailbox. Keep a regression test that proves other requests are still handled while a slow tool runs.

test "slow tools/call does not block client responsiveness" do
  # 1) Trigger a slow tools/call request
  # 2) While it is running, send a fast control request (or ping)
  # 3) Assert the fast request returns before the slow tool finishes
end

Testing Strict TaskSupervisor Mode

When strict mode is enabled, unavailable supervisors should fail fast with a stable error tuple:

test "strict mode returns explicit unavailable supervisor error" do
  Application.put_env(:claude_agent_sdk, :task_supervisor, Missing.Supervisor)
  Application.put_env(:claude_agent_sdk, :task_supervisor_strict, true)

  assert {:error, {:task_supervisor_unavailable, Missing.Supervisor}} =
           ClaudeAgentSDK.TaskSupervisor.start_child(fn -> :ok end)
end

Testing Streaming

defmodule MyApp.StreamingTest do
  use ClaudeAgentSDK.SupertesterCase

  alias ClaudeAgentSDK.{Options, Streaming}

  @moduletag :live

  test "streaming session works" do
    options = %Options{model: "haiku", max_turns: 1, allowed_tools: []}

    {:ok, session} = Streaming.start_session(options)

    try do
      result = Streaming.send_message(session, "Say hello in five words.")
      |> Enum.reduce_while(%{chunks: 0, stopped?: false}, fn
        %{type: :text_delta, text: _chunk}, acc ->
          {:cont, %{acc | chunks: acc.chunks + 1}}

        %{type: :message_stop}, acc ->
          {:halt, %{acc | stopped?: true}}

        %{type: :error, error: reason}, _acc ->
          {:halt, %{error: reason}}

        _event, acc -> {:cont, acc}
      end)

      assert result.stopped? == true
      assert result.chunks > 0
    after
      Streaming.close_session(session)
    end
  end
end

Environment Configuration for Tests

Test Configuration (config/test.exs)

import Config

config :claude_agent_sdk,
  use_mock: true,
  cli_command: "claude",
  # Keep tool timeout short in tests that exercise Tool.Registry timeouts
  tool_execution_timeout_ms: 5_000

# Optional: Customize timeouts for tests
config :claude_agent_sdk,
  timeout_ms: 30_000

When testing SessionStore startup behavior, remember cache hydration is deferred. Use bounded retries/assertions for list/search immediately after start, or assert via load_session/1 (disk fallback path).

For transport failures, assert normalized reasons (:not_connected, :cli_not_found) instead of transport-specific internals (:port_closed, {:command_not_found, "claude"}).

For transport/session startup lifecycle tests, startup_mode: :lazy lets you assert post-init startup failures deterministically (process exits after start_link).

Development Configuration (config/dev.exs)

import Config

config :claude_agent_sdk,
  use_mock: true,  # Use mocks by default in dev
  cli_command: "claude"

Production Configuration (config/prod.exs)

import Config

config :claude_agent_sdk,
  use_mock: false,
  cli_command: "claude"

Environment Variables

Variable	Description	Default
`LIVE_TESTS`	Enable live API tests	`false`
`ANTHROPIC_API_KEY`	API key for live tests	-
`CLAUDE_AGENT_OAUTH_TOKEN`	OAuth token (alternative auth)	-

Best Practices

1. Always Clear Mocks in Setup

setup do
  ClaudeAgentSDK.Mock.clear_responses()
  :ok
end

2. Use Specific Patterns for Mock Responses

# Good: Specific pattern
Mock.set_response("analyze security", security_analysis_response)

# Avoid: Too generic
Mock.set_response("a", generic_response)

3. Test Both Mock and Live Paths

# Unit tests with mocks (fast, predictable)
describe "with mocks" do
  test "handles response correctly" do
    Mock.set_response("test", expected_response)
    # ...
  end
end

# Integration tests with live API (periodic validation)
@tag :live
describe "live integration" do
  test "works with real API" do
    # ...
  end
end

4. Keep Mock Responses Realistic

# Good: Complete, realistic response
Mock.set_response("hello", [
  %{"type" => "system", "subtype" => "init", "session_id" => "mock-123"},
  %{"type" => "assistant", "message" => %{"content" => "Hello!"}},
  %{"type" => "result", "subtype" => "success", "total_cost_usd" => 0.001}
])

# Avoid: Minimal response that may not test all code paths
Mock.set_response("hello", [%{"type" => "assistant"}])

5. Use Test Fixtures for Common Patterns

alias ClaudeAgentSDK.TestSupport.TestFixtures

test "with standard fixtures" do
  hook = TestFixtures.allow_all_hook()
  options = TestFixtures.options_with_hooks(hook)
  # ...
end

6. Test Error Handling

test "handles API errors gracefully" do
  Mock.set_response("error case", [
    %{
      "type" => "assistant",
      "message" => %{"content" => "Error"},
      "error" => %{"code" => "rate_limit"}
    },
    %{
      "type" => "result",
      "subtype" => "error_during_execution",
      "is_error" => true
    }
  ])

  messages = ClaudeAgentSDK.query("error case") |> Enum.to_list()
  assert Enum.any?(messages, &(&1.type == :result and &1.subtype == :error_during_execution))
end

7. Use Eventually for Async Operations

test "async operation completes" do
  start_async_operation()

  result = SupertesterCase.eventually(fn ->
    check_completion_status()
  end, timeout: 5_000)

  assert result == :completed
end

8. Document Test Purposes

@moduledoc """
Tests for the permission callback system.

These tests verify that:
- Permission callbacks receive correct context
- Allow/deny decisions are respected
- Input modifications are applied
- Interrupt flags stop execution
"""

@doc """
Verifies that dangerous bash commands are blocked
by the permission callback.
"""
test "blocks rm -rf commands" do
  # ...
end

9. Isolate Test State

setup do
  # Clear global state
  Mock.clear_responses()

  # Create isolated resources
  {:ok, registry} = Tool.Registry.start_link([])

  on_exit(fn ->
    GenServer.stop(registry)
  end)

  {:ok, registry: registry}
end

10. Run Live Tests Periodically

# In CI, run live tests on a schedule (e.g., nightly)
LIVE_TESTS=true mix test --only live

# Locally, verify before major releases
LIVE_TESTS=true mix test --only live --trace

Test Support Files Reference

File	Purpose
`test/test_helper.exs`	Test initialization, mock setup
`test/support/supertester_case.ex`	OTP testing foundation
`test/support/mock_transport.ex`	Transport mock for client tests
`test/support/test_tools.ex`	Reusable tool definitions
`test/support/test_fixtures.ex`	Common test fixtures

Troubleshooting

Mock Not Working

# Verify mock is enabled
IO.inspect(Application.get_env(:claude_agent_sdk, :use_mock))
# Should be: true

# Verify mock server is running
IO.inspect(Process.whereis(ClaudeAgentSDK.Mock))
# Should be: a PID

Wrong Response Returned

# Check if pattern matches
# Mock uses String.contains? for matching
Mock.get_response("your actual prompt")
# Compare with expected response

Tests Timing Out

# Increase timeout for slow operations
options = %Options{timeout_ms: 60_000}

# Or use eventually with longer timeout
SupertesterCase.eventually(fn -> ... end, timeout: 10_000)

Live Tests Failing

# Verify authentication
claude --version
claude auth status

# Check environment variables
echo $ANTHROPIC_API_KEY

# Run with verbose output
LIVE_TESTS=true mix test --only live --trace

← Previous Page Sessions Guide

Next Page → Error Handling Guide

Testing Guide for Claude Agent SDK

Table of Contents

Mock System Overview

Components

Benefits

Enabling Mock Mode

Configuration-Based (Recommended for Tests)

Runtime Toggle

Test Helper Setup

Verifying Mock Mode

Setting Mock Responses

Basic Pattern Matching

Setting Default Responses

Clearing Responses

Getting Responses Programmatically

Mock Response Format

Complete Response Structure

Message Types Reference

Result Subtypes

Tool Use in Responses

Testing Hooks

Hook Events

Creating Test Hooks

Configuring Hooks in Options

Testing Hook Behavior

Hook Output Types

Testing Permission Callbacks

Permission Context

Permission Results

Creating Test Permission Callbacks

Testing Permission Callbacks

Permission Modes

Testing with SupertesterCase

Basic Usage

Eventually Helper

Using Mock Transport

Integration Testing Patterns

Live Smoke Test Pattern

Running Integration Tests

Testing SDK MCP Tools

Testing Async MCP Dispatch

Testing Strict TaskSupervisor Mode

Testing Streaming

Environment Configuration for Tests

Test Configuration (config/test.exs)

Development Configuration (config/dev.exs)

Production Configuration (config/prod.exs)

Environment Variables

Best Practices

1. Always Clear Mocks in Setup

2. Use Specific Patterns for Mock Responses

3. Test Both Mock and Live Paths

4. Keep Mock Responses Realistic

5. Use Test Fixtures for Common Patterns

6. Test Error Handling

7. Use Eventually for Async Operations

8. Document Test Purposes

9. Isolate Test State

10. Run Live Tests Periodically

Test Support Files Reference

Troubleshooting

Mock Not Working

Wrong Response Returned

Tests Timing Out

Live Tests Failing