VoskEx.Recognizer (VoskEx v0.2.1)
View SourceHigh-level wrapper for Vosk speech recognizer.
A recognizer processes audio data and returns speech recognition results. Each recognizer is bound to a specific model and sample rate.
Audio Requirements
The recognizer expects PCM 16-bit mono audio at the specified sample rate:
- Format: PCM (uncompressed)
- Bit depth: 16-bit signed integers
- Channels: Mono (single channel)
- Sample rate: Must match the rate specified at creation (typically 8000, 16000, or 44100 Hz)
- Byte order: Little-endian
For WAV files, skip the 44-byte header before passing audio to the recognizer.
Recognition Flow
- Create a recognizer with a model and sample rate
- Feed audio data using
accept_waveform/2 - Check for results:
:utterance_ended→ callresult/1for final text:continue→ callpartial_result/1for interim text
- At end of audio, call
final_result/1to flush remaining data
Thread Safety
Recognizers are NOT thread-safe. Each process should create its own recognizer instance. However, multiple recognizers can safely share the same model.
Example
# Load model once
{:ok, model} = VoskEx.Model.load("models/vosk-model-small-en-us-0.15")
# Create recognizer
{:ok, rec} = VoskEx.Recognizer.new(model, 16000.0)
VoskEx.Recognizer.set_words(rec, true) # Enable word timings
# Process audio in chunks
audio_chunks = read_audio_in_chunks("audio.wav")
for chunk <- audio_chunks do
case VoskEx.Recognizer.accept_waveform(rec, chunk) do
:utterance_ended ->
{:ok, result} = VoskEx.Recognizer.result(rec)
IO.puts("Final: #{result["text"]}")
:continue ->
{:ok, partial} = VoskEx.Recognizer.partial_result(rec)
IO.puts("Partial: #{partial["partial"]}")
end
end
# Get any remaining text
{:ok, final} = VoskEx.Recognizer.final_result(rec)
IO.puts("Final: #{final["text"]}")
Summary
Functions
Process audio data.
Get the final result at the end of the audio stream.
Create a new recognizer for the given model and sample rate.
Create a new recognizer, raising on error.
Get the current partial recognition result as a parsed map.
Reset the recognizer to start fresh.
Get the final recognition result as a parsed map.
Set maximum number of recognition alternatives to return.
Enable or disable word timing information in partial results.
Enable or disable word timing information in results.
Types
Functions
@spec accept_waveform(t(), binary()) :: waveform_result()
Process audio data.
Parameters
audio_data: Binary containing PCM 16-bit mono audio data
Returns
:utterance_ended- Silence detected, callresult/1to get recognition:continue- Keep feeding audio, can callpartial_result/1for progress:error- An error occurred
Examples
iex> audio = File.read!("audio.raw")
iex> VoskEx.Recognizer.accept_waveform(recognizer, audio)
:utterance_ended
@spec final_result(t()) :: {:ok, recognition_result()} | {:error, Jason.DecodeError.t()}
Get the final result at the end of the audio stream.
This flushes the feature pipeline to process any remaining audio.
Examples
iex> VoskEx.Recognizer.final_result(recognizer)
{:ok, %{"text" => "hello world"}}
@spec new(VoskEx.Model.t(), float()) :: {:ok, t()} | {:error, :recognizer_creation_failed}
Create a new recognizer for the given model and sample rate.
Parameters
model: A VoskEx.Model structsample_rate: Audio sample rate in Hz (typically 8000, 16000, or 44100)
Examples
iex> model = VoskEx.Model.load!("path/to/model")
iex> VoskEx.Recognizer.new(model, 16000.0)
{:ok, %VoskEx.Recognizer{}}
@spec new!(VoskEx.Model.t(), float()) :: t()
Create a new recognizer, raising on error.
@spec partial_result(t()) :: {:ok, recognition_result()} | {:error, Jason.DecodeError.t()}
Get the current partial recognition result as a parsed map.
Can be called while recognition is in progress.
Examples
iex> VoskEx.Recognizer.partial_result(recognizer)
{:ok, %{"partial" => "hello wor"}}
@spec reset(t()) :: :ok
Reset the recognizer to start fresh.
Clears all current recognition state.
Examples
iex> VoskEx.Recognizer.reset(recognizer)
:ok
@spec result(t()) :: {:ok, recognition_result()} | {:error, Jason.DecodeError.t()}
Get the final recognition result as a parsed map.
Call this after accept_waveform/2 returns :utterance_ended.
Examples
iex> VoskEx.Recognizer.result(recognizer)
{:ok, %{"text" => "hello world"}}
Set maximum number of recognition alternatives to return.
Examples
iex> VoskEx.Recognizer.set_max_alternatives(recognizer, 3)
:ok
Enable or disable word timing information in partial results.
Examples
iex> VoskEx.Recognizer.set_partial_words(recognizer, true)
:ok
Enable or disable word timing information in results.
When enabled, results include start/end times and confidence for each word.
Examples
iex> VoskEx.Recognizer.set_words(recognizer, true)
:ok