OpenAI text-to-speech model implementation.
This module implements the Codex.Voice.Model.TTSModel behaviour using
OpenAI's audio speech API. It converts text to audio and returns the
result as a stream of PCM bytes.
Default Model
The default model is gpt-4o-mini-tts, which provides high-quality
text-to-speech with support for multiple voices and instructions.
Voices
The following voices are available:
:alloy- Neutral and balanced:ash- Warm and conversational (default):coral- Clear and articulate:echo- Soft and thoughtful:fable- Expressive and dramatic:onyx- Deep and authoritative:nova- Friendly and upbeat:sage- Calm and measured:shimmer- Bright and energetic
Example
model = OpenAITTS.new()
settings = TTSSettings.new(voice: :nova, speed: 1.0)
audio_stream = OpenAITTS.run(model, "Hello, world!", settings)
Enum.each(audio_stream, fn chunk ->
# Process PCM audio chunk
end)
Summary
Functions
Create a new OpenAI TTS model.
Convert text to speech, returning a stream of PCM audio bytes.
Types
Functions
Create a new OpenAI TTS model.
Options
:client- Optional HTTP client (for testing):api_key- API key (defaults to OPENAI_API_KEY env var):base_url- API base URL (defaults to OpenAI)
Examples
iex> model = Codex.Voice.Models.OpenAITTS.new()
iex> model.model
"gpt-4o-mini-tts"
iex> model = Codex.Voice.Models.OpenAITTS.new("tts-1")
iex> model.model
"tts-1"
@spec run(t(), String.t(), Codex.Voice.Config.TTSSettings.t()) :: Enumerable.t()
Convert text to speech, returning a stream of PCM audio bytes.
Parameters
model- The OpenAITTS model structtext- The text to convert to speechsettings- TTSSettings with voice and speed options
Returns
An enumerable that yields audio bytes in PCM format. Each chunk is approximately 1024 bytes.
Example
model = OpenAITTS.new()
settings = TTSSettings.new(voice: :nova)
audio_chunks =
OpenAITTS.run(model, "Hello!", settings)
|> Enum.to_list()