This guide covers:
- Realtime API (
Codex.Realtime.*) for bidirectional websocket sessions. - Voice Pipeline (
Codex.Voice.*) for non-realtime STT -> workflow -> TTS.
Both paths make direct OpenAI API calls (they do not use codex exec / app-server transport).
Auth and prerequisites
Realtime/Voice auth precedence:
CODEX_API_KEYauth.jsonOPENAI_API_KEY(underCODEX_HOME)OPENAI_API_KEY
Codex.OAuth does not replace this direct-API precedence. ChatGPT OAuth login
helps CLI/app-server flows; realtime and voice still need an API key or an
OPENAI_API_KEY persisted in auth.json.
If your account has no credits, direct API calls may return insufficient_quota (HTTP 429). If your account lacks access to realtime models, calls may fail with model_not_found. Live examples print SKIPPED with the detected reason and exit cleanly.
Custom trust roots use CODEX_CA_CERTIFICATE first and SSL_CERT_FILE second. Blank values are
ignored. The same PEM bundle is applied to HTTPS requests and secure realtime websockets.
Realtime API
Core API surface
Use Codex.Realtime.run/2 to start a session:
alias Codex.Realtime
alias Codex.Realtime.Config.RunConfig
alias Codex.Realtime.Config.SessionModelSettings
alias Codex.Realtime.Config.TurnDetectionConfig
agent =
Realtime.agent(
name: "VoiceAssistant",
instructions: "You are concise and helpful."
)
config = %RunConfig{
model_settings: %SessionModelSettings{
voice: "alloy",
turn_detection: %TurnDetectionConfig{
type: :semantic_vad,
eagerness: :medium
}
}
}
{:ok, session} = Realtime.run(agent, config: config)
Realtime.subscribe(session, self())There is no Realtime.start_session/2 or Realtime.commit_audio/1.
Use send_audio/3 with commit: true on the final chunk of a user turn.
Sending user input
Text input:
Realtime.send_message(session, "Hello from text input")Audio input (commit on final chunk):
chunks = [chunk1, chunk2, chunk3]
total = length(chunks)
chunks
|> Enum.with_index(1)
|> Enum.each(fn {chunk, idx} ->
Realtime.send_audio(session, chunk, commit: idx == total)
end)Receiving events
Realtime events are delivered as {:session_event, event}:
alias Codex.Realtime.Events
receive do
{:session_event, %Events.AgentStartEvent{agent: agent}} ->
IO.puts("agent start: #{agent.name}")
{:session_event, %Events.AudioEvent{audio: audio}} ->
# audio.data is PCM bytes for output playback/storage
File.write!("/tmp/realtime_output.pcm", audio.data, [:append])
{:session_event, %Events.AudioEndEvent{}} ->
IO.puts("audio segment completed")
{:session_event, %Events.ToolStartEvent{tool: tool}} ->
IO.puts("tool call: #{inspect(tool)}")
{:session_event, %Events.ToolEndEvent{output: output}} ->
IO.puts("tool output: #{output}")
{:session_event, %Events.HandoffEvent{from_agent: from, to_agent: to}} ->
IO.puts("handoff: #{from.name} -> #{to.name}")
{:session_event, %Events.ErrorEvent{error: error}} ->
IO.puts("realtime error: #{inspect(error)}")
endHandoffs
Configure handoffs directly on the realtime agent:
support =
Realtime.agent(
name: "TechSupport",
instructions: "Handle technical troubleshooting."
)
greeter =
Realtime.agent(
name: "Greeter",
instructions: "Route technical questions to TechSupport.",
handoffs: [support]
)At session start, handoffs are exposed to the model as transfer_to_* function tools.
When the model calls one, the session:
- switches
current_agent, - pushes updated session settings (
session.update), - emits
%Events.HandoffEvent{}, - sends tool output back to the model.
Realtime debugging tips
If output audio is empty:
- Confirm a voice is configured (
SessionModelSettings.voice). - Confirm audio input boundaries are committed (
commit: trueon final chunk). - Log
%Events.ErrorEvent{}and count%Events.AudioEvent{}deltas. - Check for quota/auth errors (
insufficient_quota, unauthorized API key, etc.). Realtimeresponse.donefailures are surfaced as%Events.ErrorEvent{}.
Session lifecycle helpers
Codex.Realtime.Session behavior exposed through Codex.Realtime:
subscribe/2andunsubscribe/2are idempotent.current_agent/1returns the active agent (useful after handoff).history/1returns current item history.close/1stops the session.
Voice Pipeline (non-realtime)
Use this path when you want STT -> custom workflow -> TTS, without a live websocket conversation loop.
alias Codex.Voice.{Config, Pipeline, SimpleWorkflow}
alias Codex.Voice.Config.{STTSettings, TTSSettings}
alias Codex.Voice.Input.AudioInput
workflow =
SimpleWorkflow.new(
fn transcript -> ["You said: #{transcript}"] end,
greeting: "Hello! I am listening."
)
config = %Config{
workflow_name: "voice_demo",
stt_settings: %STTSettings{model: "gpt-4o-transcribe"},
tts_settings: %TTSSettings{model: "gpt-4o-mini-tts", voice: :nova}
}
{:ok, pipeline} = Pipeline.start_link(workflow: workflow, config: config)
input = AudioInput.new(wav_binary, format: :wav)
{:ok, result_stream} = Pipeline.run(pipeline, input)
for event <- result_stream do
case event do
%Codex.Voice.Events.VoiceStreamEventAudio{data: audio_chunk} ->
# play/store audio_chunk
:ok
%Codex.Voice.Events.VoiceStreamEventLifecycle{event: :session_ended} ->
IO.puts("pipeline complete")
%Codex.Voice.Events.VoiceStreamEventError{error: error} ->
IO.puts("pipeline error: #{inspect(error)}")
end
endExample scripts
Realtime:
mix run examples/realtime_basic.exs
mix run examples/realtime_tools.exs
mix run examples/realtime_handoffs.exs
mix run examples/live_realtime_voice.exs
Voice pipeline:
mix run examples/voice_pipeline.exs
mix run examples/voice_multi_turn.exs
mix run examples/voice_with_agent.exs
Notes
- Realtime and Voice are direct API integrations; they do not rely on Codex CLI login tokens alone.
CODEX_CA_CERTIFICATEtakes precedence overSSL_CERT_FILEfor HTTPS/WSS trust roots.- Keep examples deterministic by setting voice and explicit audio turn boundaries.
- For CI or no-credit environments, treat
insufficient_quotaas a known skip condition for direct API demos.