Python Feature Parity Plan
View SourceContext & Research
- Requirement: deliver a 100% feature-parity Elixir port of the Python Codex SDK (openai/codex), which itself shells out to
codex-rs. - Local audit: the repo already mirrors TypeScript SDK semantics (streaming vs buffered turns, structured outputs, resume support) and ships with comprehensive TDD documentation (
docs/03-implementation-plan.md,docs/04-testing-strategy.md) plus an OTP-centric architecture guide (docs/02-architecture.md). - Gap: the Python client source is not vendored here; we must study its public repo (modules like
codex.client,codex.threads,codex.tools, auto-run orchestration) to capture feature surface: session persistence, command approvals, attachments, tool/mcp support, structured responses, sandbox controls, telemetry hooks, error domains. - Evidence inside
codex/confirmscodex-rsis the canonical Rust engine. Both CLI and TypeScript SDK consume it as a subprocess via JSONL event streams; parity requires replicating the Python abstractions atop the same protocol.
Dependency Management Recommendation
- Submodule the Rust Engine
- Add
openai/codexas a git submodule, but use sparse checkout to pull onlycodex-rs/**(Rust workspace) pluscodex-rs/scriptsfor release tooling. - Rationale: keeps us aligned with upstream Rust changes without inheriting Python/TypeScript-specific glue, preserves a clean update path (
git submodule update --remote), and avoids vendoring redundant SDKs.
- Add
- Thin Mix Wrapper
- Create a
native/codex_rsmix namespace responsible for compiling / downloading the binary. Prefer building from source viacargo build --releasegated behindMIX_ENV=devto keep CI deterministic. Cache artifacts under_build/codex_rs. - Provide
mix codex.installthat: ensures Rust toolchain (rustup,cargo) exists, runs the sparse submodule build, and writes platform-specific binaries intopriv/codex/.
- Create a
- Optional Prebuilt Fallback
- Mirror the Python client’s wheel strategy: allow
MIX_ENV=prodto download signed release zips from upstream if local Rust toolchain is unavailable. Keep this behind an opt-in flag to avoid default network calls.
- Mirror the Python client’s wheel strategy: allow
- Isolation Guarantees
- Treat the submodule as read-only; any patches live under
patches/codex-rs/*.patchapplied during build so we can rebase cleanly. - Expose version pin in
config/native.exsso we can coordinate updates across Python/TypeScript ports.
- Treat the submodule as read-only; any patches live under
Feature Parity Inventory
| Python Capability | Elixir Workstream | Notes |
|---|---|---|
Thread lifecycle (start_thread, resume_thread) | Align existing Codex / Codex.Thread structs with Python semantics (thread metadata, continuation tokens). | Verify Python exposes mutable thread options mid-session. |
Turn execution (run, run_streamed, auto-run) | Expand Codex.Thread API to support auto-run (loop until success), piping tool calls back. | Requires event-driven state machine mirroring Python’s RealtimeTurn. |
| Event model (items: agent_message, reasoning, tool, file diffs) | Finish typed structs + conversions; ensure JSON enums match Python contract. | Reuse JSONL schema from TypeScript docs. |
| Tool / MCP integration | Implement tool registration layer mirroring Python’s decorators; wrap Rust MCP protocol (already exposed in codex-rs). | Needs Elixir behaviours + supervision for external servers. |
| Attachments & file uploads | Provide APIs for ephemeral file staging before invoking turns; align with Python client.files. | Likely orchestrated via codex-rs file API endpoints. |
| Sandbox & approvals | Surface sandbox modes / approval callbacks for command execution; expose hooks to accept/refuse. | Requires intercepting command_execution items and optionally halting turn. |
| Structured output | Already partially covered; confirm schema validation & error reporting match Python. | Add contract tests referencing Python examples. |
| Telemetry / logging | Mirror Python logging hooks (callbacks/events) using :telemetry events. | Define canonical telemetry namespaces. |
| Error taxonomy | Match Python exceptions (e.g., CodexError, TurnFailed, AuthError). | Map Rust exit codes -> Elixir exceptions. |
TDD Roadmap
Milestone 0 – Discovery (1 sprint)
- Write characterization tests by observing Python SDK (fixtures capturing JSONL transcripts for each feature).
- Build comparison harness: run Python client in CI to emit golden event logs; store under
integration/fixtures/python/*.jsonl.
Milestone 1 – Core Parity (2 sprints)
- Implement finalized struct definitions (Threads, Turns, Items, Usage) with doctests verifying JSON serialization.
- Red/green for blocking turn flow using recorded transcripts; replicate auto-setting
thread_id,final_response, token usage. - Add streaming API returning Elixir
Streamthat matches Python async generator behavior; verify via property tests comparing event ordering.
Milestone 2 – Tooling & Sandbox (2 sprints)
- Introduce tool registry behaviour with Mox-driven tests: register mock tool, ensure invocation lifecycle matches Python callback signature.
- Implement approval middleware: tests simulate command events requiring approval; ensure denial aborts turn with matching error.
- Add working directory and sandbox flag handling, reusing CLI flags; integration tests spawn a fake
codex-rsexecutable.
Milestone 3 – Attachments & Structured Output (1 sprint)
- Build file upload staging service (uses
~/.codex/artifactslike Python). Tests cover cleanup, multiple attachments, and large file handling. - Expand structured output tests to include schema failure cases mirrored from Python unit tests.
Milestone 4 – Observability & Ergonomics (1 sprint)
- Emit
:telemetryevents for lifecycle milestones; use capture fixtures to assert parity with Python logging expectations. - Finalize error modules; cross-check error messages and codes using golden logs from Python runs.
- Document API and migrate examples to ensure parity (docs + doctests).
Milestone 5 – Regression Safety Net (ongoing)
- Add contract tests that run both Python and Elixir clients against mock codex-rs binary, diffing event streams.
- Integrate coverage gate via
mix coveralls≥ Python suite coverage baseline; ensure CI matrix spans macOS + Linux.
Risks & Mitigations
- Rust submodule divergence: track upstream via dependabot-style reminders; lock commit hash and regenerate artifacts on upgrade.
- Cross-platform builds: provide Docker-based build fallback; run CI on macOS/Linux to catch sandbox differences.
- Unknown Python features: maintain parity checklist updated as we learn; write failing tests first referencing Python behavior.
- Binary size in Hex releases: publish native binary separately (e.g., optional NIF package) to keep main Hex package lightweight.
Immediate Next Actions
- Add
openai/codexas submodule with sparse checkout limited tocodex-rs/**; script verification inscripts/bootstrap_rust.sh. - Inventory Python SDK by cloning repo, exporting feature list & event transcripts (store under
integration/fixtures). - Update project docs (
docs/02-architecture.md,docs/04-testing-strategy.md) with Python-specific nuances discovered during audit. - Draft initial parity checklist issue to track milestones and assign owners.