Tool Execution Metrics – Design (2025-10-17)

Overview

Track invocation statistics for registered tools (success/failure counts, latency, retry metadata) and expose them via telemetry and optional in-memory counters. Aimed at surfacing operational insights without external instrumentation.

Goals

Capture per-tool metrics for both synchronous and async tool runs.
Provide Codex.Tools.metrics/0 returning snapshot map.
Emit telemetry for tool.started, tool.succeeded, tool.failed.
Integrate with auto-run loop to tag retries.

Non-Goals

Persist metrics to disk.
Provide dashboards/exporters (beyond telemetry).
Handle structured custom metrics (only core counters/timings).

Architecture

ETS table :codex_tool_metrics keyed by tool name.
Codex.Tools.Registry updates metrics when invoke/3 succeeds or fails.
Wrap tool invocation in :timer.tc to compute latency.
Telemetry events ([:codex, :tool, :start], etc.) include tool metadata and latency (on completion).
Optional Codex.Tools.reset_metrics/0 for tests.

Data Schema

%{
  "web_search" => %{
    success: 12,
    failure: 3,
    last_error: {:tool_failure, reason},
    last_latency_ms: 152,
    total_latency_ms: 4200
  }
}

API Changes

Codex.Tools.metrics/0 and Codex.Tools.reset_metrics/0.
Telemetry event specs documented in Codex.Telemetry.

Risks

ETS contention under high throughput — mitigate via :write_concurrency.
Large number of tools may expand snapshot; acceptable for in-memory map.

Implementation Plan

Create ETS table during application start (Codex.Tools reset!/0).
Update registry invoke/3 to wrap calls with timing and update counters.
Emit telemetry events (include :retry? flag from auto-run).
Add docs & examples.

Verification

Unit tests: metrics increments on success/failure, reset works.
Integration: auto-run scenario with retries increments failure then success.
Telemetry tests capture events to ensure metadata correctness.

Open Questions

Should we expose rate (success %)? Could compute client-side — out of scope.

← Previous Page Configurable Sandbox Hooks – Design (2025-10-17)

Next Page → Prompt: Parity Advancements (Tool Bridging, Rich Events, Structured Output)