Elixir Codex SDK - Project Goals and Design

Overview

The Elixir Codex SDK is an idiomatic, production-ready wrapper around OpenAI's codex-rs CLI executable. This SDK brings the power of OpenAI's Codex agent—a sophisticated AI assistant capable of reasoning, code generation, file manipulation, command execution, and more—into the Elixir/OTP ecosystem.

Project Goals

Primary Objectives

Complete Feature Parity: Implement all functionality available in the official TypeScript SDK
Idiomatic Elixir: Leverage OTP principles, GenServers, and BEAM concurrency patterns
Production Ready: Robust error handling, supervision trees, telemetry integration
Type Safety: Comprehensive structs using TypedStruct for all events, items, and options
Battle Tested: Deterministic, async test suite using Supertester (zero Process.sleep)
Developer Experience: Clear APIs, comprehensive documentation, helpful examples

Secondary Objectives

Performance: Efficient streaming with backpressure, minimal memory overhead
Observability: Telemetry events for monitoring and debugging
Extensibility: Clean abstractions for future enhancements
Maintainability: Well-documented code, consistent patterns, comprehensive tests

Core Concepts

The Codex Agent

Codex is an AI agent that can:

Analyze and generate code across multiple languages
Execute shell commands in a controlled sandbox
Read, write, and modify files with precise diffs
Search the web for up-to-date information
Make calls to Model Context Protocol (MCP) tools
Reason about complex problems and maintain task lists
Produce structured JSON output conforming to schemas

Threads and Turns

Thread: A persistent conversation session with the agent. Threads maintain context across multiple interactions and are stored in ~/.codex/sessions.

Turn: A single request-response cycle within a thread. Each turn:

Starts with a user prompt (input)
Produces a stream of events as the agent works
Completes with a final response and usage statistics
May include multiple items (messages, commands, file changes, etc.)

Items

Items are the atomic units of work in a thread. Each item represents a specific action or artifact:

AgentMessage: Text or JSON response from the agent
Reasoning: The agent's reasoning process summary
CommandExecution: Shell command with status and output
FileChange: File modifications (add, update, delete)
McpToolCall: External tool invocation via MCP
WebSearch: Web search query and results
TodoList: Agent's running task list
Error: Non-fatal error items

Events

Events are emitted during turn execution to provide real-time updates:

Thread-Level Events

ThreadStarted: New thread initialized with ID
TurnStarted: Agent begins processing prompt
TurnCompleted: Turn finished with usage stats
TurnFailed: Turn encountered fatal error

Item-Level Events

ItemStarted: New item added (typically in progress)
ItemUpdated: Item state changed
ItemCompleted: Item reached terminal state

Module Structure

Core Modules

`Codex`

The main entry point for the SDK.

Responsibilities:

Create new threads
Resume existing threads
Manage global options (API key, base URL, codex path)

Key Functions:

start_thread(codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
resume_thread(thread_id, codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}

`Codex.Thread`

Manages individual conversation threads and turn execution.

Responsibilities:

Execute turns (blocking or streaming)
Maintain thread state and ID
Apply thread-level options (model, sandbox, working directory)

Key Functions:

run(thread, input, turn_opts \\ %{}) :: {:ok, turn_result} | {:error, term}
run_streamed(thread, input, turn_opts \\ %{}) :: {:ok, stream} | {:error, term}

`Codex.Exec`

GenServer that manages the codex-rs OS process lifecycle.

Responsibilities:

Spawn and manage codex-rs process via Port
Handle JSONL stdin/stdout communication
Parse events and forward to caller
Clean up resources on exit or crash

Key Behaviors:

One GenServer per turn execution
Supervised process with proper cleanup
Telemetry events for observability

Type Modules

`Codex.Events`

Defines all event types using TypedStruct.

Event Types:

ThreadStarted, TurnStarted, TurnCompleted, TurnFailed
ItemStarted, ItemUpdated, ItemCompleted
ThreadError

`Codex.Items`

Defines all item types and their status enums.

Item Types:

AgentMessage, Reasoning, CommandExecution, FileChange
McpToolCall, WebSearch, TodoList, Error

Status Types:

CommandExecutionStatus: :in_progress, :completed, :failed
PatchApplyStatus: :completed, :failed
McpToolCallStatus: :in_progress, :completed, :failed

`Codex.Options`

Configuration structs for all levels.

Structs:

Codex.Options: Global options (codex path, API key, base URL)
Codex.Thread.Options: Thread options (model, sandbox, working directory)
Codex.Turn.Options: Turn options (output schema)

Utility Modules

`Codex.OutputSchemaFile`

Helper for managing JSON schema temporary files.

Responsibilities:

Create temporary file with JSON schema
Provide cleanup function
Handle errors gracefully

Architecture Patterns

Process Model

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ (synchronous API calls)
       ▼
┌─────────────────┐
│ Codex.Thread    │  (stateful struct, holds thread_id and options)
└────────┬────────┘
         │ (spawns)
         ▼
┌──────────────────┐
│  Codex.Exec      │  (GenServer - one per turn)
│  (GenServer)     │  - Manages codex-rs lifecycle
└────────┬─────────┘  - Parses JSONL events
         │ (spawns)   - Handles Port communication
         ▼
┌──────────────────┐
│   Port (stdin/   │  (IPC with codex-rs)
│    stdout)       │  - JSONL over stdin
└────────┬─────────┘  - JSONL events from stdout
         │
         ▼
┌──────────────────┐
│   codex-rs       │  (OpenAI's Rust CLI)
│   (OS Process)   │  - Manages OpenAI API calls
└──────────────────┘  - Executes commands/file ops
                      - Streams events

Data Flow

Client calls Codex.Thread.run/3
- Thread module starts Codex.Exec GenServer
- Passes thread_id, options, and input
Exec spawns codex-rs process
- Constructs command line arguments
- Sets environment variables
- Opens Port with :binary, :use_stdio, :exit_status
Exec sends input via stdin
- Writes prompt to Port
- Closes stdin to signal end of input
Exec receives JSONL events from stdout
- Parses each line as JSON
- Converts to structured event structs
- Forwards events to caller
Blocking mode (run/3)
- Exec accumulates events
- Extracts final response and items
- Returns complete TurnResult when turn completes
Streaming mode (run_streamed/3)
- Exec yields events as they arrive
- Client processes events in real-time
- Stream completes when turn finishes

Error Handling

Recoverable Errors

Non-fatal errors become ErrorItem in thread
Agent continues processing
Turn completes normally

Fatal Errors

TurnFailed event with error details
Process exits gracefully
Resources cleaned up
Error propagated to client

Process Crashes

GenServer supervision restarts failed processes
Port monitors detect process termination
Cleanup functions remove temporary files
Telemetry events logged

Streaming Strategy

Streaming Pros:

Real-time updates for responsive UIs
Process events as they arrive
Lower memory footprint for long turns

Streaming Cons:

More complex client code
Requires event handling logic

Blocking Pros:

Simple API for scripting
No event handling needed
Complete result in one call

Blocking Cons:

Higher memory usage
No progress visibility
Longer wait times

Implementation:

run_streamed/3 returns a Stream/Enumerable
run/3 internally uses streaming but buffers results
Both share same Codex.Exec implementation

Feature Set

Completed in TypeScript SDK (Target Parity)

Core Operations
- Start new threads
- Resume existing threads
- Execute turns (blocking and streaming)
- Handle all event types
- Parse all item types
Configuration
- Custom codex binary path
- API key and base URL override
- Model selection
- Sandbox modes (read-only, workspace-write, danger-full-access)
- Working directory control
- Git repo check bypass
Structured Output
- JSON schema support
- Temporary file management
- Schema validation
Error Handling
- Process spawn errors
- JSON parse errors
- Exit code handling
- stderr capture

Additional Features for Elixir

OTP Integration
- GenServer-based process management
- Supervision tree support
- Proper resource cleanup
Telemetry
- Turn start/complete events
- Error events
- Performance metrics
Type Safety
- TypedStruct for all data types
- Compile-time type checking
- Documentation from types
Testing
- Supertester integration
- Mock GenServer implementation
- Deterministic async tests
- Chaos engineering tests

Development Approach

Test-Driven Development

Write tests first: Define expected behavior through tests
Implement minimally: Write just enough code to pass tests
Refactor confidently: Tests provide safety net
Document through tests: Tests serve as executable documentation

Incremental Implementation

Week 1: Core types and module stubs

Define all event/item structs
Create module outlines
Set up test infrastructure

Week 2: Exec GenServer implementation

Port-based process management
JSONL parsing
Event forwarding

Week 3: Thread management

Blocking turn execution
Streaming turn execution
Option handling

Week 4: Integration and polish

End-to-end tests
Documentation
Examples
CI/CD

Quality Standards

Code Coverage: Target 95%+ line coverage
Documentation: All public functions have @doc
Typespecs: All public functions have @spec
Dialyzer: Zero warnings
Credo: All issues resolved
Tests: Zero flaky tests, all async

Success Criteria

Must Have (MVP)

[ ] All TypeScript SDK features implemented
[ ] All tests passing (unit, integration, property)
[ ] Documentation complete (API docs, guides, examples)
[ ] CI/CD pipeline green
[ ] Published to Hex.pm

Should Have (v1.0)

[ ] Telemetry integration documented
[ ] Supervision tree examples
[ ] Performance benchmarks
[ ] Chaos engineering tests
[ ] Real-world examples

Could Have (Future)

[ ] Custom event handlers
[ ] Persistent event logging
[ ] WebSocket-based streaming
[ ] Native NIF for performance
[ ] Phoenix LiveView integration examples

References

← Previous Page README

Next Page → Architecture Guide