Behaviours for speech-to-text and text-to-speech models.
This module defines the behaviours (interfaces) that voice models must implement. It provides a consistent API for transcribing audio (STT) and generating speech (TTS).
STT Models
Speech-to-text models convert audio input into text. They support both:
- Single-shot transcription via
transcribe/5 - Streaming transcription sessions via
create_session/4
TTS Models
Text-to-speech models convert text into audio. They return a stream of audio bytes in PCM format.
Model Providers
Model providers are factories that create STT and TTS models by name.
The OpenAIProvider is the default implementation.
Example
# Using the OpenAI provider
provider = Codex.Voice.Models.OpenAIProvider.new()
stt_model = Codex.Voice.Models.OpenAIProvider.get_stt_model(provider, nil)
tts_model = Codex.Voice.Models.OpenAIProvider.get_tts_model(provider, nil)
# Transcribe audio
{:ok, text} = Codex.Voice.Models.OpenAISTT.transcribe(
stt_model, audio_input, stt_settings, true, false
)
# Generate speech
audio_stream = Codex.Voice.Models.OpenAITTS.run(tts_model, "Hello!", tts_settings)