Sandbox & Approval Workflow Design

View Source

Feature Summary

  • Mirror Python client's sandbox modes and approval callbacks for command execution, tool usage, and file access.
  • Provide flexible policy engine allowing synchronous approval, async queueing, and default deny/allow behaviors.
  • Surface audit logs and telemetry for governance visibility.

Subagent Perspectives

Subagent Astra (API Strategist)

Subagent Borealis (Concurrency Specialist)

  • Embed approval workflow in turn execution pipeline, pausing event processing until decision arrives.
  • Use GenServer.call with timeout to avoid deadlocks; support async decisions via Task with reply.
  • Ensure sandbox enforcement integrates with codex-rs flags (filesystem isolation, network policy) via command args.

Subagent Cypher (Test Architect)

  • Write integration tests simulating command execution events requiring approval; verify acceptance continues run, denial halts with error.
  • Add property tests for policy combinators (priorities, fallbacks).
  • Create contract tests using Python logs to confirm identical approval sequencing and error messages.

Implementation Tasks

  • Build policy registry accessible per thread; default to allow but log warnings if no policy configured.
  • Map sandbox options to codex-rs CLI flags and ensure they are idempotent.
  • Emit telemetry events ([:codex, :approval, ...]) for monitoring dashboards.

Current Status

  • Codex.Approvals.StaticPolicy ships with allow/0 and deny/1 helpers used by tests and the default auto-run pipeline.
  • Tool invocations now consult the configured policy and halt auto-run with a tagged error when denied.
  • Tool-call events can arrive with approved_by_policy (or approved) already set by upstream safe-command checks; the SDK should bypass hooks in that case while still emitting telemetry for downstream auditing. Example:
event = %Codex.Events.ToolCallRequested{
  tool_name: "list_dir",
  call_id: "safe-1",
  requires_approval: true,
  approved_by_policy: true,
  sandbox_warnings: ["Read-only git dir: C:/workspace/.git"]
}
Codex.Approvals.review_tool(policy, event, %{}) # => :allow

TDD Entry Points

  1. Red test where approval module denies command and turn returns specific error tuple.
  2. Add test verifying sandbox flag translation from thread options to codex-rs command line.
  3. Implement asynchronous approval queue test ensuring decisions resume execution.

Risks & Mitigations

  • Deadlocks: enforce timeouts and fallback policies; document defaults.
  • Policy misconfiguration: provide compile-time warnings when policies missing required callbacks.
  • Telemetry gaps: add integration tests ensuring audit events emitted consistently.

Open Questions

  • Do we need multi-step approvals (e.g., staged vs execute)? Confirm with product requirements.
  • Should approvals integrate with external message bus? Evaluate after MVP.