Orchestrates STT → Workflow → TTS voice pipelines.
A voice pipeline provides an opinionated three-step flow for voice agents:
- Transcribe - Convert audio input to text using a speech-to-text model
- Process - Run the text through a workflow to generate a response
- Synthesize - Convert the response text to audio using a text-to-speech model
Workflows
A workflow is any module that implements a run/2 function that takes the
workflow struct and input text, returning an enumerable of response text
chunks.
For multi-turn conversations with StreamedAudioInput, workflows can
optionally implement on_start/1 to generate an initial greeting.
Example
# Define a simple workflow
defmodule EchoWorkflow do
defstruct []
def run(_workflow, text) do
["You said: #{text}"]
end
def on_start(_workflow) do
["Hello! I'm an echo bot. Say something!"]
end
end
# Create and run the pipeline
workflow = %EchoWorkflow{}
pipeline = Pipeline.new(workflow: workflow)
audio = AudioInput.new(audio_bytes)
{:ok, result} = Pipeline.run(pipeline, audio)
# Stream the results
result
|> Result.stream()
|> Enum.each(fn event ->
case event do
%{type: :voice_stream_event_audio, data: data} ->
play_audio(data)
%{type: :voice_stream_event_lifecycle, event: :turn_ended} ->
IO.puts("Turn complete!")
%{type: :voice_stream_event_lifecycle, event: :session_ended} ->
IO.puts("Session ended")
%{type: :voice_stream_event_error, error: error} ->
Logger.error("Error: #{inspect(error)}")
end
end)Input Types
AudioInput- A static audio buffer for single-turn interactionsStreamedAudioInput- A streaming audio input for multi-turn conversations
Configuration
The pipeline can be configured with custom STT and TTS models, or it will use the default OpenAI models:
config = Config.new(
workflow_name: "Customer Support",
tts_settings: TTSSettings.new(voice: :nova)
)
pipeline = Pipeline.new(
workflow: workflow,
config: config,
stt_model: "whisper-1",
tts_model: "tts-1-hd"
)
Summary
Types
@type t() :: %Codex.Voice.Pipeline{ config: Codex.Voice.Config.t(), stt_model: struct(), tts_model: struct(), workflow: struct() }
Functions
Create a new voice pipeline.
Options
:workflow- Required. The workflow module to run (must have arun/2function):stt_model- Speech-to-text model. Can be a model struct, model name string, or nil for default:tts_model- Text-to-speech model. Can be a model struct, model name string, or nil for default:config- Pipeline configuration (defaults to%Config{})
Examples
# Simple pipeline with defaults
pipeline = Pipeline.new(workflow: %MyWorkflow{})
# Pipeline with custom models
pipeline = Pipeline.new(
workflow: %MyWorkflow{},
stt_model: "whisper-1",
tts_model: "tts-1-hd"
)
# Pipeline with full configuration
pipeline = Pipeline.new(
workflow: %MyWorkflow{},
config: Config.new(
workflow_name: "Support Agent",
tts_settings: TTSSettings.new(voice: :nova)
)
)
@spec run( t(), Codex.Voice.Input.AudioInput.t() | Codex.Voice.Input.StreamedAudioInput.t() ) :: {:ok, Codex.Voice.Result.t()}
Run the pipeline on audio input.
Parameters
pipeline- The voice pipelineaudio_input- EitherAudioInputfor single-turn orStreamedAudioInputfor multi-turn
Returns
{:ok, result}- AResultstruct that can be streamed for events. Errors that occur during processing are delivered asVoiceStreamEventErrorevents in the result stream rather than being returned from this function.
Examples
# Single-turn with static audio
audio = AudioInput.new(audio_bytes)
{:ok, result} = Pipeline.run(pipeline, audio)
# Multi-turn with streaming audio
input = StreamedAudioInput.new()
{:ok, result} = Pipeline.run(pipeline, input)
# Push audio chunks to the input
StreamedAudioInput.add(input, chunk1)
StreamedAudioInput.add(input, chunk2)
StreamedAudioInput.close(input)