Elixir Codex SDK - Project Goals and Design
View SourceOverview
The Elixir Codex SDK is an idiomatic, production-ready wrapper around OpenAI's codex-rs
CLI executable. This SDK brings the power of OpenAI's Codex agent—a sophisticated AI assistant capable of reasoning, code generation, file manipulation, command execution, and more—into the Elixir/OTP ecosystem.
Project Goals
Primary Objectives
- Complete Feature Parity: Implement all functionality available in the official TypeScript SDK
- Idiomatic Elixir: Leverage OTP principles, GenServers, and BEAM concurrency patterns
- Production Ready: Robust error handling, supervision trees, telemetry integration
- Type Safety: Comprehensive structs using TypedStruct for all events, items, and options
- Battle Tested: Deterministic, async test suite using Supertester (zero
Process.sleep
) - Developer Experience: Clear APIs, comprehensive documentation, helpful examples
Secondary Objectives
- Performance: Efficient streaming with backpressure, minimal memory overhead
- Observability: Telemetry events for monitoring and debugging
- Extensibility: Clean abstractions for future enhancements
- Maintainability: Well-documented code, consistent patterns, comprehensive tests
Core Concepts
The Codex Agent
Codex is an AI agent that can:
- Analyze and generate code across multiple languages
- Execute shell commands in a controlled sandbox
- Read, write, and modify files with precise diffs
- Search the web for up-to-date information
- Make calls to Model Context Protocol (MCP) tools
- Reason about complex problems and maintain task lists
- Produce structured JSON output conforming to schemas
Threads and Turns
Thread: A persistent conversation session with the agent. Threads maintain context across multiple interactions and are stored in ~/.codex/sessions
.
Turn: A single request-response cycle within a thread. Each turn:
- Starts with a user prompt (input)
- Produces a stream of events as the agent works
- Completes with a final response and usage statistics
- May include multiple items (messages, commands, file changes, etc.)
Items
Items are the atomic units of work in a thread. Each item represents a specific action or artifact:
- AgentMessage: Text or JSON response from the agent
- Reasoning: The agent's reasoning process summary
- CommandExecution: Shell command with status and output
- FileChange: File modifications (add, update, delete)
- McpToolCall: External tool invocation via MCP
- WebSearch: Web search query and results
- TodoList: Agent's running task list
- Error: Non-fatal error items
Events
Events are emitted during turn execution to provide real-time updates:
Thread-Level Events
ThreadStarted
: New thread initialized with IDTurnStarted
: Agent begins processing promptTurnCompleted
: Turn finished with usage statsTurnFailed
: Turn encountered fatal error
Item-Level Events
ItemStarted
: New item added (typically in progress)ItemUpdated
: Item state changedItemCompleted
: Item reached terminal state
Module Structure
Core Modules
Codex
The main entry point for the SDK.
Responsibilities:
- Create new threads
- Resume existing threads
- Manage global options (API key, base URL, codex path)
Key Functions:
start_thread(codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
resume_thread(thread_id, codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
Codex.Thread
Manages individual conversation threads and turn execution.
Responsibilities:
- Execute turns (blocking or streaming)
- Maintain thread state and ID
- Apply thread-level options (model, sandbox, working directory)
Key Functions:
run(thread, input, turn_opts \\ %{}) :: {:ok, turn_result} | {:error, term}
run_streamed(thread, input, turn_opts \\ %{}) :: {:ok, stream} | {:error, term}
Codex.Exec
GenServer that manages the codex-rs
OS process lifecycle.
Responsibilities:
- Spawn and manage codex-rs process via Port
- Handle JSONL stdin/stdout communication
- Parse events and forward to caller
- Clean up resources on exit or crash
Key Behaviors:
- One GenServer per turn execution
- Supervised process with proper cleanup
- Telemetry events for observability
Type Modules
Codex.Events
Defines all event types using TypedStruct.
Event Types:
ThreadStarted
,TurnStarted
,TurnCompleted
,TurnFailed
ItemStarted
,ItemUpdated
,ItemCompleted
ThreadError
Codex.Items
Defines all item types and their status enums.
Item Types:
AgentMessage
,Reasoning
,CommandExecution
,FileChange
McpToolCall
,WebSearch
,TodoList
,Error
Status Types:
CommandExecutionStatus
::in_progress
,:completed
,:failed
PatchApplyStatus
::completed
,:failed
McpToolCallStatus
::in_progress
,:completed
,:failed
Codex.Options
Configuration structs for all levels.
Structs:
Codex.Options
: Global options (codex path, API key, base URL)Codex.Thread.Options
: Thread options (model, sandbox, working directory)Codex.Turn.Options
: Turn options (output schema)
Utility Modules
Codex.OutputSchemaFile
Helper for managing JSON schema temporary files.
Responsibilities:
- Create temporary file with JSON schema
- Provide cleanup function
- Handle errors gracefully
Architecture Patterns
Process Model
┌─────────────┐
│ Client │
└──────┬──────┘
│ (synchronous API calls)
▼
┌─────────────────┐
│ Codex.Thread │ (stateful struct, holds thread_id and options)
└────────┬────────┘
│ (spawns)
▼
┌──────────────────┐
│ Codex.Exec │ (GenServer - one per turn)
│ (GenServer) │ - Manages codex-rs lifecycle
└────────┬─────────┘ - Parses JSONL events
│ (spawns) - Handles Port communication
▼
┌──────────────────┐
│ Port (stdin/ │ (IPC with codex-rs)
│ stdout) │ - JSONL over stdin
└────────┬─────────┘ - JSONL events from stdout
│
▼
┌──────────────────┐
│ codex-rs │ (OpenAI's Rust CLI)
│ (OS Process) │ - Manages OpenAI API calls
└──────────────────┘ - Executes commands/file ops
- Streams events
Data Flow
Client calls
Codex.Thread.run/3
- Thread module starts
Codex.Exec
GenServer - Passes thread_id, options, and input
- Thread module starts
Exec spawns codex-rs process
- Constructs command line arguments
- Sets environment variables
- Opens Port with
:binary
,:use_stdio
,:exit_status
Exec sends input via stdin
- Writes prompt to Port
- Closes stdin to signal end of input
Exec receives JSONL events from stdout
- Parses each line as JSON
- Converts to structured event structs
- Forwards events to caller
Blocking mode (
run/3
)- Exec accumulates events
- Extracts final response and items
- Returns complete
TurnResult
when turn completes
Streaming mode (
run_streamed/3
)- Exec yields events as they arrive
- Client processes events in real-time
- Stream completes when turn finishes
Error Handling
Recoverable Errors
- Non-fatal errors become
ErrorItem
in thread - Agent continues processing
- Turn completes normally
Fatal Errors
TurnFailed
event with error details- Process exits gracefully
- Resources cleaned up
- Error propagated to client
Process Crashes
- GenServer supervision restarts failed processes
- Port monitors detect process termination
- Cleanup functions remove temporary files
- Telemetry events logged
Streaming Strategy
Streaming Pros:
- Real-time updates for responsive UIs
- Process events as they arrive
- Lower memory footprint for long turns
Streaming Cons:
- More complex client code
- Requires event handling logic
Blocking Pros:
- Simple API for scripting
- No event handling needed
- Complete result in one call
Blocking Cons:
- Higher memory usage
- No progress visibility
- Longer wait times
Implementation:
run_streamed/3
returns a Stream/Enumerablerun/3
internally uses streaming but buffers results- Both share same
Codex.Exec
implementation
Feature Set
Completed in TypeScript SDK (Target Parity)
Core Operations
- Start new threads
- Resume existing threads
- Execute turns (blocking and streaming)
- Handle all event types
- Parse all item types
Configuration
- Custom codex binary path
- API key and base URL override
- Model selection
- Sandbox modes (read-only, workspace-write, danger-full-access)
- Working directory control
- Git repo check bypass
Structured Output
- JSON schema support
- Temporary file management
- Schema validation
Error Handling
- Process spawn errors
- JSON parse errors
- Exit code handling
- stderr capture
Additional Features for Elixir
OTP Integration
- GenServer-based process management
- Supervision tree support
- Proper resource cleanup
Telemetry
- Turn start/complete events
- Error events
- Performance metrics
Type Safety
- TypedStruct for all data types
- Compile-time type checking
- Documentation from types
Testing
- Supertester integration
- Mock GenServer implementation
- Deterministic async tests
- Chaos engineering tests
Development Approach
Test-Driven Development
- Write tests first: Define expected behavior through tests
- Implement minimally: Write just enough code to pass tests
- Refactor confidently: Tests provide safety net
- Document through tests: Tests serve as executable documentation
Incremental Implementation
Week 1: Core types and module stubs
- Define all event/item structs
- Create module outlines
- Set up test infrastructure
Week 2: Exec GenServer implementation
- Port-based process management
- JSONL parsing
- Event forwarding
Week 3: Thread management
- Blocking turn execution
- Streaming turn execution
- Option handling
Week 4: Integration and polish
- End-to-end tests
- Documentation
- Examples
- CI/CD
Quality Standards
- Code Coverage: Target 95%+ line coverage
- Documentation: All public functions have @doc
- Typespecs: All public functions have @spec
- Dialyzer: Zero warnings
- Credo: All issues resolved
- Tests: Zero flaky tests, all async
Success Criteria
Must Have (MVP)
- [ ] All TypeScript SDK features implemented
- [ ] All tests passing (unit, integration, property)
- [ ] Documentation complete (API docs, guides, examples)
- [ ] CI/CD pipeline green
- [ ] Published to Hex.pm
Should Have (v1.0)
- [ ] Telemetry integration documented
- [ ] Supervision tree examples
- [ ] Performance benchmarks
- [ ] Chaos engineering tests
- [ ] Real-world examples
Could Have (Future)
- [ ] Custom event handlers
- [ ] Persistent event logging
- [ ] WebSocket-based streaming
- [ ] Native NIF for performance
- [ ] Phoenix LiveView integration examples