Codex.Voice.Model (Codex SDK v0.16.1)

Copy Markdown View Source

Behaviours for speech-to-text and text-to-speech models.

This module defines the behaviours (interfaces) that voice models must implement. It provides a consistent API for transcribing audio (STT) and generating speech (TTS).

STT Models

Speech-to-text models convert audio input into text. They support both:

  • Single-shot transcription via transcribe/5
  • Streaming transcription sessions via create_session/4

TTS Models

Text-to-speech models convert text into audio. They return a stream of audio bytes in PCM format.

Model Providers

Model providers are factories that create STT and TTS models by name. The OpenAIProvider is the default implementation.

Example

# Using the OpenAI provider
provider = Codex.Voice.Models.OpenAIProvider.new()
stt_model = Codex.Voice.Models.OpenAIProvider.get_stt_model(provider, nil)
tts_model = Codex.Voice.Models.OpenAIProvider.get_tts_model(provider, nil)

# Transcribe audio
{:ok, text} = Codex.Voice.Models.OpenAISTT.transcribe(
  stt_model, audio_input, stt_settings, true, false
)

# Generate speech
audio_stream = Codex.Voice.Models.OpenAITTS.run(tts_model, "Hello!", tts_settings)