Codex.Voice.Pipeline (Codex SDK v0.7.2)

Copy Markdown View Source

Orchestrates STT → Workflow → TTS voice pipelines.

A voice pipeline provides an opinionated three-step flow for voice agents:

  1. Transcribe - Convert audio input to text using a speech-to-text model
  2. Process - Run the text through a workflow to generate a response
  3. Synthesize - Convert the response text to audio using a text-to-speech model

Workflows

A workflow is any module that implements a run/2 function that takes the workflow struct and input text, returning an enumerable of response text chunks.

For multi-turn conversations with StreamedAudioInput, workflows can optionally implement on_start/1 to generate an initial greeting.

Example

# Define a simple workflow
defmodule EchoWorkflow do
  defstruct []

  def run(_workflow, text) do
    ["You said: #{text}"]
  end

  def on_start(_workflow) do
    ["Hello! I'm an echo bot. Say something!"]
  end
end

# Create and run the pipeline
workflow = %EchoWorkflow{}
pipeline = Pipeline.new(workflow: workflow)

audio = AudioInput.new(audio_bytes)
{:ok, result} = Pipeline.run(pipeline, audio)

# Stream the results
result
|> Result.stream()
|> Enum.each(fn event ->
  case event do
    %{type: :voice_stream_event_audio, data: data} ->
      play_audio(data)

    %{type: :voice_stream_event_lifecycle, event: :turn_ended} ->
      IO.puts("Turn complete!")

    %{type: :voice_stream_event_lifecycle, event: :session_ended} ->
      IO.puts("Session ended")

    %{type: :voice_stream_event_error, error: error} ->
      Logger.error("Error: #{inspect(error)}")
  end
end)

Input Types

  • AudioInput - A static audio buffer for single-turn interactions
  • StreamedAudioInput - A streaming audio input for multi-turn conversations

Configuration

The pipeline can be configured with custom STT and TTS models, or it will use the default OpenAI models:

config = Config.new(
  workflow_name: "Customer Support",
  tts_settings: TTSSettings.new(voice: :nova)
)

pipeline = Pipeline.new(
  workflow: workflow,
  config: config,
  stt_model: "whisper-1",
  tts_model: "tts-1-hd"
)

Summary

Functions

Create a new voice pipeline.

Run the pipeline on audio input.

Types

t()

@type t() :: %Codex.Voice.Pipeline{
  config: Codex.Voice.Config.t(),
  stt_model: struct(),
  tts_model: struct(),
  workflow: struct()
}

Functions

new(opts)

@spec new(keyword()) :: t()

Create a new voice pipeline.

Options

  • :workflow - Required. The workflow module to run (must have a run/2 function)
  • :stt_model - Speech-to-text model. Can be a model struct, model name string, or nil for default
  • :tts_model - Text-to-speech model. Can be a model struct, model name string, or nil for default
  • :config - Pipeline configuration (defaults to %Config{})

Examples

# Simple pipeline with defaults
pipeline = Pipeline.new(workflow: %MyWorkflow{})

# Pipeline with custom models
pipeline = Pipeline.new(
  workflow: %MyWorkflow{},
  stt_model: "whisper-1",
  tts_model: "tts-1-hd"
)

# Pipeline with full configuration
pipeline = Pipeline.new(
  workflow: %MyWorkflow{},
  config: Config.new(
    workflow_name: "Support Agent",
    tts_settings: TTSSettings.new(voice: :nova)
  )
)

run(pipeline, audio)

Run the pipeline on audio input.

Parameters

  • pipeline - The voice pipeline
  • audio_input - Either AudioInput for single-turn or StreamedAudioInput for multi-turn

Returns

  • {:ok, result} - A Result struct that can be streamed for events. Errors that occur during processing are delivered as VoiceStreamEventError events in the result stream rather than being returned from this function.

Examples

# Single-turn with static audio
audio = AudioInput.new(audio_bytes)
{:ok, result} = Pipeline.run(pipeline, audio)

# Multi-turn with streaming audio
input = StreamedAudioInput.new()
{:ok, result} = Pipeline.run(pipeline, input)

# Push audio chunks to the input
StreamedAudioInput.add(input, chunk1)
StreamedAudioInput.add(input, chunk2)
StreamedAudioInput.close(input)