Prompt: Parity Advancements (Tool Bridging, Rich Events, Structured Output)
View SourceRequired Reading
docs/20251018/python-parity-status.mddocs/20251018/codex-sdk-completion/module-implementation.mdcodex/codex-rs/exec/src/cli.rscodex/codex-rs/exec/src/exec_events.rscodex/codex-rs/core/tests/suite/tool_harness.rslib/codex/thread.exlib/codex/exec.exlib/codex/events.exlib/codex/turn/result.excodex/sdk/typescript/src/thread.ts
Make sure you understand how the TypeScript SDK forwards tool outputs and structured schemas, how the Rust event schema represents command/file/MCP/web-search/todo items, and how the Elixir thread pipeline currently processes events and turn results.
Context
We recently added schema persistence, basic tool output tracking, and item lifecycle support in the Elixir SDK. The next parity push focuses on:
- Replaying tool outputs/failures back into
codex execso continuation attempts mirror Python. - Surfacing rich item structs (command executions, file diffs, MCP/tool activity, web search, todo list) so downstream code can reason about event payloads without ad-hoc maps.
- Decoding structured responses into typed maps/structs (rather than raw JSON strings) and exposing helper APIs for callers.
The Rust CLI already supports these capabilities; our Elixir layer must catch up while preserving the existing TDD discipline.
Implementation Instructions (TDD)
Work through each focus area sequentially, following red/green/refactor for every change. Keep the tests isolated and deterministic by relying on fixture scripts or new unit tests where appropriate.
Tool Output Forwarding
- Red: add regression tests that assert
Codex.Execforwards tool outputs/failures (e.g., capture CLI args or simulate multiple continuation attempts). Ensure tests fail until the CLI contract is honored. - Green: confirm the CLI flag/JSON payload required by
codex exec(consult Rust tests) and plumb the pending outputs/failures fromCodex.ThreadintoCodex.Exec. Update auto-run tests to validate end-to-end behavior. - Refactor: clean up temporary scaffolding, deduplicate argument assembly, and document the contract in code comments.
- Red: add regression tests that assert
Rich Event Structs
- Red: expand
Codex.EventsTest(and any integration tests) to expect typed structs for command execution, file change, MCP tool call, web search, and todo list events. Cover both parse and round-trip serialization. - Green: implement the structs and parser updates in
Codex.Events, update the turn-event reducer inCodex.Thread(planned as {@literal Codex.Thread.fold_events/2}) to incorporate new metadata, and ensure the turn result reflects richer data. - Refactor: consolidate shared helpers, keep the struct definitions organized, and update docs or doctests as needed.
- Red: expand
Structured Response Decoding
- Red: write tests validating that
Codex.Thread.run/3returns decoded maps when an output schema is supplied (include failure cases for invalid JSON). Consider property-based tests for schema-conformant payloads. - Green: add a decoder layer (e.g., helper function or new module) that takes the final response and, when the schema is present, parses the JSON into maps/structs. Ensure streaming callers receive consistent types.
- Refactor: extract reusable utilities, surface helper APIs for callers (e.g.,
Codex.Turn.Result.json/1), and document the behavior.
- Red: write tests validating that
Deliverables
- Updated tests and implementation across the Elixir SDK covering the three focus areas.
- Any new fixture scripts or harvested transcripts needed to exercise the behavior.
- Documentation or comments capturing CLI contracts and structured-response semantics.
- Update
docs/20251018/python-parity-status.mdafter completion to reflect the newly closed gaps.