Codex.Voice.Models.OpenAITTS (Codex SDK v0.14.0)

Copy Markdown View Source

OpenAI text-to-speech model implementation.

This module implements the Codex.Voice.Model.TTSModel behaviour using OpenAI's audio speech API. It converts text to audio and returns the result as a stream of PCM bytes.

Default Model

The default model is gpt-4o-mini-tts, which provides high-quality text-to-speech with support for multiple voices and instructions.

Voices

The following voices are available:

  • :alloy - Neutral and balanced
  • :ash - Warm and conversational (default)
  • :coral - Clear and articulate
  • :echo - Soft and thoughtful
  • :fable - Expressive and dramatic
  • :onyx - Deep and authoritative
  • :nova - Friendly and upbeat
  • :sage - Calm and measured
  • :shimmer - Bright and energetic

Example

model = OpenAITTS.new()
settings = TTSSettings.new(voice: :nova, speed: 1.0)

audio_stream = OpenAITTS.run(model, "Hello, world!", settings)

Enum.each(audio_stream, fn chunk ->
  # Process PCM audio chunk
end)

Summary

Functions

Create a new OpenAI TTS model.

Convert text to speech, returning a stream of PCM audio bytes.

Types

t()

@type t() :: %Codex.Voice.Models.OpenAITTS{
  api_key: String.t() | nil,
  base_url: String.t(),
  client: term(),
  model: String.t()
}

Functions

new(model \\ nil, opts \\ [])

@spec new(
  String.t() | nil,
  keyword()
) :: t()

Create a new OpenAI TTS model.

Options

  • :client - Optional HTTP client (for testing)
  • :api_key - API key (defaults to OPENAI_API_KEY env var)
  • :base_url - API base URL (defaults to OpenAI)

Examples

iex> model = Codex.Voice.Models.OpenAITTS.new()
iex> model.model
"gpt-4o-mini-tts"

iex> model = Codex.Voice.Models.OpenAITTS.new("tts-1")
iex> model.model
"tts-1"

run(model, text, settings)

Convert text to speech, returning a stream of PCM audio bytes.

Parameters

  • model - The OpenAITTS model struct
  • text - The text to convert to speech
  • settings - TTSSettings with voice and speed options

Returns

An enumerable that yields audio bytes in PCM format. Each chunk is approximately 1024 bytes.

Example

model = OpenAITTS.new()
settings = TTSSettings.new(voice: :nova)

audio_chunks =
  OpenAITTS.run(model, "Hello!", settings)
  |> Enum.to_list()