VoskEx (VoskEx v0.2.1)
View SourceElixir bindings for the Vosk API speech recognition library.
VoskEx provides offline, high-performance speech-to-text capabilities for Elixir applications through bindings to the Vosk API.
Installation
Add vosk_ex to your list of dependencies in mix.exs:
def deps do
[
{:vosk_ex, "~> 0.1.0"}
]
endVoskEx automatically downloads precompiled Vosk libraries during compilation, so no system dependencies are required!
Simply run:
mix deps.get
mix compile # Automatically downloads Vosk library (~2-7 MB) for your platform
Supported platforms:
- Linux: x86_64, aarch64 (ARM64)
- macOS: Intel (x86_64), Apple Silicon (M1/M2/M3)
- Windows: x64
Configuration
VoskEx logs are disabled by default to keep your application logs clean.
To enable Vosk/Kaldi internal logging, add to your config/config.exs:
config :vosk_ex,
log_level: 0 # -1 = silent (default), 0 = default logging, >0 = more verboseQuick Start
# Download a model first
Mix.Task.run("vosk.download_model", ["en"])
# Load model and create recognizer
{:ok, model} = VoskEx.Model.load("models/vosk-model-small-en-us-0.15")
{:ok, recognizer} = VoskEx.Recognizer.new(model, 16000.0)
# Process audio (PCM 16-bit mono)
audio = File.read!("audio.wav") |> binary_part(44, byte_size(audio) - 44)
case VoskEx.Recognizer.accept_waveform(recognizer, audio) do
:utterance_ended ->
{:ok, result} = VoskEx.Recognizer.result(recognizer)
IO.inspect(result["text"])
:continue ->
{:ok, partial} = VoskEx.Recognizer.partial_result(recognizer)
IO.inspect(partial["partial"])
endArchitecture
This module provides low-level bindings to the Vosk C API.
For a more user-friendly interface, use VoskEx.Model and VoskEx.Recognizer.
The library uses a three-layer architecture:
- Layer 1 (C NIF): Direct bindings to Vosk C API via
erl_nif.h - Layer 2 (Low-Level): This module - thin Elixir wrapper with NIF stubs
- Layer 3 (High-Level):
VoskEx.ModelandVoskEx.Recognizer- idiomatic Elixir API
Resources are automatically managed through BEAM garbage collection.
Summary
Functions
Process audio data (PCM 16-bit mono).
Create a recognizer for the given model and sample rate.
Check if a word exists in the model vocabulary.
Get final recognition result as JSON string.
Get partial recognition result as JSON string.
Get recognition result as JSON string.
Load a Vosk model from a directory path.
Reset the recognizer to start fresh.
Set the log level for Vosk/Kaldi messages.
Set maximum number of alternatives to return in results.
Enable/disable word timing information in partial results.
Enable/disable word timing information in results.
Functions
Process audio data (PCM 16-bit mono).
Returns:
- 1: utterance ended (silence detected)
- 0: continue processing
- -1: error occurred
Create a recognizer for the given model and sample rate.
Returns {:ok, recognizer_ref} or {:error, :recognizer_creation_failed}.
Check if a word exists in the model vocabulary.
Returns the word symbol (>= 0) if found, or -1 if not found.
Get final recognition result as JSON string.
Call this at the end of the stream to flush remaining audio.
Get partial recognition result as JSON string.
This can be called while recognition is in progress.
Get recognition result as JSON string.
Call this after accept_waveform returns 1.
Load a Vosk model from a directory path.
Returns {:ok, model_ref} or {:error, :model_load_failed}.
Reset the recognizer to start fresh.
Set the log level for Vosk/Kaldi messages.
- -1: silent
- 0: default logging
0: more verbose
Set maximum number of alternatives to return in results.
Enable/disable word timing information in partial results.
Enable/disable word timing information in results.