Elixir Codex SDK - Project Goals and Design

View Source

Overview

The Elixir Codex SDK is an idiomatic, production-ready wrapper around OpenAI's codex-rs CLI executable. This SDK brings the power of OpenAI's Codex agent—a sophisticated AI assistant capable of reasoning, code generation, file manipulation, command execution, and more—into the Elixir/OTP ecosystem.

Project Goals

Primary Objectives

  1. Complete Feature Parity: Implement all functionality available in the official TypeScript SDK
  2. Idiomatic Elixir: Leverage OTP principles, GenServers, and BEAM concurrency patterns
  3. Production Ready: Robust error handling, supervision trees, telemetry integration
  4. Type Safety: Comprehensive structs using TypedStruct for all events, items, and options
  5. Battle Tested: Deterministic, async test suite using Supertester (zero Process.sleep)
  6. Developer Experience: Clear APIs, comprehensive documentation, helpful examples

Secondary Objectives

  1. Performance: Efficient streaming with backpressure, minimal memory overhead
  2. Observability: Telemetry events for monitoring and debugging
  3. Extensibility: Clean abstractions for future enhancements
  4. Maintainability: Well-documented code, consistent patterns, comprehensive tests

Core Concepts

The Codex Agent

Codex is an AI agent that can:

  • Analyze and generate code across multiple languages
  • Execute shell commands in a controlled sandbox
  • Read, write, and modify files with precise diffs
  • Search the web for up-to-date information
  • Make calls to Model Context Protocol (MCP) tools
  • Reason about complex problems and maintain task lists
  • Produce structured JSON output conforming to schemas

Threads and Turns

Thread: A persistent conversation session with the agent. Threads maintain context across multiple interactions and are stored in ~/.codex/sessions.

Turn: A single request-response cycle within a thread. Each turn:

  • Starts with a user prompt (input)
  • Produces a stream of events as the agent works
  • Completes with a final response and usage statistics
  • May include multiple items (messages, commands, file changes, etc.)

Items

Items are the atomic units of work in a thread. Each item represents a specific action or artifact:

  • AgentMessage: Text or JSON response from the agent
  • Reasoning: The agent's reasoning process summary
  • CommandExecution: Shell command with status and output
  • FileChange: File modifications (add, update, delete)
  • McpToolCall: External tool invocation via MCP
  • WebSearch: Web search query and results
  • TodoList: Agent's running task list
  • Error: Non-fatal error items

Events

Events are emitted during turn execution to provide real-time updates:

Thread-Level Events

  • ThreadStarted: New thread initialized with ID
  • TurnStarted: Agent begins processing prompt
  • TurnCompleted: Turn finished with usage stats
  • TurnFailed: Turn encountered fatal error

Item-Level Events

  • ItemStarted: New item added (typically in progress)
  • ItemUpdated: Item state changed
  • ItemCompleted: Item reached terminal state

Module Structure

Core Modules

Codex

The main entry point for the SDK.

Responsibilities:

  • Create new threads
  • Resume existing threads
  • Manage global options (API key, base URL, codex path)

Key Functions:

start_thread(codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
resume_thread(thread_id, codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}

Codex.Thread

Manages individual conversation threads and turn execution.

Responsibilities:

  • Execute turns (blocking or streaming)
  • Maintain thread state and ID
  • Apply thread-level options (model, sandbox, working directory)

Key Functions:

run(thread, input, turn_opts \\ %{}) :: {:ok, turn_result} | {:error, term}
run_streamed(thread, input, turn_opts \\ %{}) :: {:ok, stream} | {:error, term}

Codex.Exec

GenServer that manages the codex-rs OS process lifecycle.

Responsibilities:

  • Spawn and manage codex-rs process via Port
  • Handle JSONL stdin/stdout communication
  • Parse events and forward to caller
  • Clean up resources on exit or crash

Key Behaviors:

  • One GenServer per turn execution
  • Supervised process with proper cleanup
  • Telemetry events for observability

Type Modules

Codex.Events

Defines all event types using TypedStruct.

Event Types:

  • ThreadStarted, TurnStarted, TurnCompleted, TurnFailed
  • ItemStarted, ItemUpdated, ItemCompleted
  • ThreadError

Codex.Items

Defines all item types and their status enums.

Item Types:

  • AgentMessage, Reasoning, CommandExecution, FileChange
  • McpToolCall, WebSearch, TodoList, Error

Status Types:

  • CommandExecutionStatus: :in_progress, :completed, :failed
  • PatchApplyStatus: :completed, :failed
  • McpToolCallStatus: :in_progress, :completed, :failed

Codex.Options

Configuration structs for all levels.

Structs:

  • Codex.Options: Global options (codex path, API key, base URL)
  • Codex.Thread.Options: Thread options (model, sandbox, working directory)
  • Codex.Turn.Options: Turn options (output schema)

Utility Modules

Codex.OutputSchemaFile

Helper for managing JSON schema temporary files.

Responsibilities:

  • Create temporary file with JSON schema
  • Provide cleanup function
  • Handle errors gracefully

Architecture Patterns

Process Model


   Client    

        (synchronous API calls)
       

 Codex.Thread      (stateful struct, holds thread_id and options)

          (spawns)
         

  Codex.Exec        (GenServer - one per turn)
  (GenServer)       - Manages codex-rs lifecycle
  - Parses JSONL events
          (spawns)   - Handles Port communication
         

   Port (stdin/     (IPC with codex-rs)
    stdout)         - JSONL over stdin
  - JSONL events from stdout
         
         

   codex-rs         (OpenAI's Rust CLI)
   (OS Process)     - Manages OpenAI API calls
  - Executes commands/file ops
                      - Streams events

Data Flow

  1. Client calls Codex.Thread.run/3

    • Thread module starts Codex.Exec GenServer
    • Passes thread_id, options, and input
  2. Exec spawns codex-rs process

    • Constructs command line arguments
    • Sets environment variables
    • Opens Port with :binary, :use_stdio, :exit_status
  3. Exec sends input via stdin

    • Writes prompt to Port
    • Closes stdin to signal end of input
  4. Exec receives JSONL events from stdout

    • Parses each line as JSON
    • Converts to structured event structs
    • Forwards events to caller
  5. Blocking mode (run/3)

    • Exec accumulates events
    • Extracts final response and items
    • Returns complete TurnResult when turn completes
  6. Streaming mode (run_streamed/3)

    • Exec yields events as they arrive
    • Client processes events in real-time
    • Stream completes when turn finishes

Error Handling

Recoverable Errors

  • Non-fatal errors become ErrorItem in thread
  • Agent continues processing
  • Turn completes normally

Fatal Errors

  • TurnFailed event with error details
  • Process exits gracefully
  • Resources cleaned up
  • Error propagated to client

Process Crashes

  • GenServer supervision restarts failed processes
  • Port monitors detect process termination
  • Cleanup functions remove temporary files
  • Telemetry events logged

Streaming Strategy

Streaming Pros:

  • Real-time updates for responsive UIs
  • Process events as they arrive
  • Lower memory footprint for long turns

Streaming Cons:

  • More complex client code
  • Requires event handling logic

Blocking Pros:

  • Simple API for scripting
  • No event handling needed
  • Complete result in one call

Blocking Cons:

  • Higher memory usage
  • No progress visibility
  • Longer wait times

Implementation:

  • run_streamed/3 returns a Stream/Enumerable
  • run/3 internally uses streaming but buffers results
  • Both share same Codex.Exec implementation

Feature Set

Completed in TypeScript SDK (Target Parity)

  1. Core Operations

    • Start new threads
    • Resume existing threads
    • Execute turns (blocking and streaming)
    • Handle all event types
    • Parse all item types
  2. Configuration

    • Custom codex binary path
    • API key and base URL override
    • Model selection
    • Sandbox modes (read-only, workspace-write, danger-full-access)
    • Working directory control
    • Git repo check bypass
  3. Structured Output

    • JSON schema support
    • Temporary file management
    • Schema validation
  4. Error Handling

    • Process spawn errors
    • JSON parse errors
    • Exit code handling
    • stderr capture

Additional Features for Elixir

  1. OTP Integration

    • GenServer-based process management
    • Supervision tree support
    • Proper resource cleanup
  2. Telemetry

    • Turn start/complete events
    • Error events
    • Performance metrics
  3. Type Safety

    • TypedStruct for all data types
    • Compile-time type checking
    • Documentation from types
  4. Testing

    • Supertester integration
    • Mock GenServer implementation
    • Deterministic async tests
    • Chaos engineering tests

Development Approach

Test-Driven Development

  1. Write tests first: Define expected behavior through tests
  2. Implement minimally: Write just enough code to pass tests
  3. Refactor confidently: Tests provide safety net
  4. Document through tests: Tests serve as executable documentation

Incremental Implementation

Week 1: Core types and module stubs

  • Define all event/item structs
  • Create module outlines
  • Set up test infrastructure

Week 2: Exec GenServer implementation

  • Port-based process management
  • JSONL parsing
  • Event forwarding

Week 3: Thread management

  • Blocking turn execution
  • Streaming turn execution
  • Option handling

Week 4: Integration and polish

  • End-to-end tests
  • Documentation
  • Examples
  • CI/CD

Quality Standards

  1. Code Coverage: Target 95%+ line coverage
  2. Documentation: All public functions have @doc
  3. Typespecs: All public functions have @spec
  4. Dialyzer: Zero warnings
  5. Credo: All issues resolved
  6. Tests: Zero flaky tests, all async

Success Criteria

Must Have (MVP)

  • [ ] All TypeScript SDK features implemented
  • [ ] All tests passing (unit, integration, property)
  • [ ] Documentation complete (API docs, guides, examples)
  • [ ] CI/CD pipeline green
  • [ ] Published to Hex.pm

Should Have (v1.0)

  • [ ] Telemetry integration documented
  • [ ] Supervision tree examples
  • [ ] Performance benchmarks
  • [ ] Chaos engineering tests
  • [ ] Real-world examples

Could Have (Future)

  • [ ] Custom event handlers
  • [ ] Persistent event logging
  • [ ] WebSocket-based streaming
  • [ ] Native NIF for performance
  • [ ] Phoenix LiveView integration examples

References