Behaviour for speech-to-text models.
STT models convert audio input into text transcriptions. They support both single-shot transcription and streaming transcription sessions.
Note: The behaviour callbacks use module-level functions. Implementations
should use struct-based models where the struct is passed as the first
parameter to instance methods like transcribe/5.
Summary
Callbacks
@callback create_session( input :: Codex.Voice.Input.StreamedAudioInput.t(), settings :: Codex.Voice.Config.STTSettings.t(), trace_include_sensitive_data :: boolean(), trace_include_sensitive_audio_data :: boolean() ) :: {:ok, pid()} | {:error, term()}
Creates a streaming transcription session.
The session receives audio input via the StreamedAudioInput and
produces text transcriptions for each detected turn.
Parameters
input- The streamed audio inputsettings- STT settingstrace_include_sensitive_data- Whether to include text in tracestrace_include_sensitive_audio_data- Whether to include audio in traces
Returns
{:ok, session_pid}- The session process{:error, reason}- If session creation fails
@callback model_name() :: String.t()
Returns the name of the STT model.