PtcRunner.TraceLog (PtcRunner v0.7.0)

Copy Markdown View Source

Captures SubAgent execution events to JSONL files for offline analysis.

TraceLog attaches to SubAgent telemetry events and writes them to a JSONL file, enabling detailed debugging and performance analysis of agent executions.

Usage

The simplest way to capture a trace is with with_trace/2:

{:ok, step, trace_path} = TraceLog.with_trace(fn ->
  SubAgent.run(agent, llm: my_llm())
end)

# Analyze the trace
events = TraceLog.Analyzer.load(trace_path)
summary = TraceLog.Analyzer.summary(events)

For more control, use start/1 and stop/1:

{:ok, collector} = TraceLog.start(path: "my_trace.jsonl")
{:ok, step} = SubAgent.run(agent, llm: my_llm())
{:ok, path, errors} = TraceLog.stop(collector)

Event Format

Each line in the JSONL file is a JSON object with:

{
  "event": "run.start",           # Event type (run|turn|llm|tool).(start|stop|exception)
  "trace_id": "abc123...",        # Unique trace identifier
  "timestamp": "2024-01-...",     # ISO 8601 timestamp
  "measurements": {...},          # Telemetry measurements
  "metadata": {...},              # Event-specific metadata
  "duration_ms": 123              # Duration (for stop events)
}

Process Isolation and Cross-Process Propagation

Traces are isolated by process. Only events from the process that called start/1 are captured. This allows multiple concurrent traces without interference.

Nested traces are supported - each with_trace call creates its own trace file, and events are routed to the innermost active collector.

Cross-Process Tracing

When execution spans multiple processes (e.g., PlanRunner parallel tasks), use join/2 to propagate trace context to child processes:

collectors = TraceLog.active_collectors()
parent_span = PtcRunner.SubAgent.Telemetry.current_span_id()

Task.async(fn ->
  TraceLog.join(collectors, parent_span)
  # Events from this process are now captured AND linked to parent
end)

PlanRunner automatically propagates trace context to parallel task workers.

Note: The sandbox process inherits trace collectors via join/2, so tool telemetry events (tool.start, tool.stop) emitted inside the sandbox are captured directly by the trace handler.

See Also

Summary

Functions

Returns all active collectors for the current process.

Returns the collector for the current process, if any.

Joins the current process to existing trace collectors.

Starts trace collection for the current process.

Stops trace collection and closes the trace file.

Executes a function while capturing a trace.

Functions

active_collectors()

@spec active_collectors() :: [pid()]

Returns all active collectors for the current process.

The list is ordered from innermost (most recent) to outermost.

current_collector()

@spec current_collector() :: pid() | nil

Returns the collector for the current process, if any.

Examples

{:ok, _collector} = TraceLog.start()
collector = TraceLog.current_collector()
# collector is a pid

join(collectors, parent_span_id \\ nil)

@spec join([pid()], String.t() | nil) :: :ok

Joins the current process to existing trace collectors.

This is used for trace propagation across process boundaries. When spawning child processes (via Task.async_stream, Process.spawn, etc.), the parent's trace collectors are not automatically inherited. Call this function at the start of the child process to re-attach to the parent's trace session.

Parameters

  • collectors - List of collector PIDs to join (from active_collectors/0)
  • parent_span_id - Optional span ID from parent process for span hierarchy

Example

# In parent process
collectors = TraceLog.active_collectors()
parent_span = PtcRunner.SubAgent.Telemetry.current_span_id()

Task.async(fn ->
  TraceLog.join(collectors, parent_span)
  # Now trace events from this process will be captured
  # AND linked to the parent span hierarchy
  SubAgent.run(agent, llm: llm)
end)

Notes

  • Only joins collectors that are still alive (stale PIDs are filtered out)
  • Does not attach telemetry handlers (they are global and already attached)
  • Safe to call multiple times or with an empty list
  • When parent_span_id is provided, sets up span hierarchy so new spans in this process have the parent span as their parent_span_id

start(opts \\ [])

@spec start(keyword()) :: {:ok, pid()}

Starts trace collection for the current process.

Returns a collector process that will capture all SubAgent telemetry events from this process until stop/1 is called.

Options

  • :path - File path for the JSONL output. Defaults to a timestamped file.
  • :trace_id - Custom trace identifier. Defaults to a random hex string.
  • :meta - Additional metadata to include in the trace header.

Examples

{:ok, collector} = TraceLog.start()
{:ok, step} = SubAgent.run(agent, llm: my_llm())
{:ok, path, errors} = TraceLog.stop(collector)

# With custom path
{:ok, collector} = TraceLog.start(path: "/tmp/debug.jsonl")

stop(collector)

@spec stop(pid()) :: {:ok, String.t(), non_neg_integer()}

Stops trace collection and closes the trace file.

Returns the path to the trace file and the number of write errors (if any).

Examples

{:ok, collector} = TraceLog.start()
# ... run SubAgent ...
{:ok, path, errors} = TraceLog.stop(collector)

with_trace(fun, opts \\ [])

@spec with_trace(
  (-> result),
  keyword()
) :: {:ok, result, String.t()}
when result: term()

Executes a function while capturing a trace.

This is the recommended way to capture traces. It ensures the trace is properly started and stopped, even if the function raises an exception.

Options

  • :path - File path for the JSONL output
  • :trace_id - Custom trace identifier
  • :meta - Additional metadata

Examples

{:ok, step, trace_path} = TraceLog.with_trace(fn ->
  SubAgent.run(agent, llm: my_llm())
end)

# With options
{:ok, step, path} = TraceLog.with_trace(
  fn -> SubAgent.run(agent, llm: my_llm()) end,
  path: "/tmp/trace.jsonl",
  meta: %{user: "test"}
)