Turn Execution Design
View SourceFeature Summary
- Implement blocking (
run/3) and streaming (run_streamed/3) turn execution mirroring Python client semantics. - Support auto-run loops with configurable retry policies and tool invocation bridging.
- Return structured
Codex.Turn.Resultincluding final response, usage metrics, and collected events.
Subagent Perspectives
Subagent Astra (API Strategist)
Maintain simple API signatures:
run(thread, input, opts \\ [])returning{:ok, TurnResult.t()} | {:error, term()}.- For streaming, return cold
Enumerablethat yields typed event structs; integrate with ElixirStreamAPI. - Expose auto-run via
Codex.Thread.run_auto/3with optional callbacks mirroring Python'son_event.
Subagent Borealis (Concurrency Specialist)
- Ensure each turn starts a dedicated
Codex.Execprocess supervised under a dynamic supervisor. - Use
GenServer.multi_callor monitor references to allow early cancellation and clean shutdown. - Manage backpressure in streaming runs using
GenStage-style mailbox checks or flow control tokens from codex-rs.
Subagent Cypher (Test Architect)
- Derive golden event sequences from Python transcripts; assert event ordering, final response extraction, and usage aggregation.
- Write property tests verifying streaming enumerables do not execute until consumed and respect manual halt.
- Simulate failure modes (port crash, malformed JSON) with fake binaries to ensure graceful error propagation.
Implementation Tasks
- Define
%Codex.Turn{}and%Codex.Turn.Result{}structs with@enforce_keys. - Implement run pipeline: option prep → Exec start → event collection → finalize result → teardown.
- Build streaming layer using
Stream.resource/3, ensuring cleanup inafter_fun. - Add auto-run loop coordinating retries, tool handling, and stop conditions.
TDD Entry Points
- Write failing integration test that executes recorded blocking turn fixture and asserts final response and usage.
- Add streaming test verifying first event is not consumed until enumerated.
- Introduce red test for auto-run loop using fixture with retryable tool call.
Risks & Mitigations
- Resource leaks: guard with monitors and
try ... afterblocks; add tests that assert process counts remain stable. - Backpressure issues: throttle message sends and document expectations for consumer speed.
- Auto-run divergence: log and expose step-by-step events to match Python debug output.
Open Questions
- Should auto-run expose a callback interface or rely on instrumentation hooks? Await product feedback.
- Confirm whether Python handles partial successes differently when tools fail—need fixtures.