VoskEx (VoskEx v0.2.1)

View Source

Elixir bindings for the Vosk API speech recognition library.

VoskEx provides offline, high-performance speech-to-text capabilities for Elixir applications through bindings to the Vosk API.

Installation

Add vosk_ex to your list of dependencies in mix.exs:

def deps do
  [
    {:vosk_ex, "~> 0.1.0"}
  ]
end

VoskEx automatically downloads precompiled Vosk libraries during compilation, so no system dependencies are required!

Simply run:

mix deps.get
mix compile  # Automatically downloads Vosk library (~2-7 MB) for your platform

Supported platforms:

  • Linux: x86_64, aarch64 (ARM64)
  • macOS: Intel (x86_64), Apple Silicon (M1/M2/M3)
  • Windows: x64

Configuration

VoskEx logs are disabled by default to keep your application logs clean. To enable Vosk/Kaldi internal logging, add to your config/config.exs:

config :vosk_ex,
  log_level: 0  # -1 = silent (default), 0 = default logging, >0 = more verbose

Quick Start

# Download a model first
Mix.Task.run("vosk.download_model", ["en"])

# Load model and create recognizer
{:ok, model} = VoskEx.Model.load("models/vosk-model-small-en-us-0.15")
{:ok, recognizer} = VoskEx.Recognizer.new(model, 16000.0)

# Process audio (PCM 16-bit mono)
audio = File.read!("audio.wav") |> binary_part(44, byte_size(audio) - 44)
case VoskEx.Recognizer.accept_waveform(recognizer, audio) do
  :utterance_ended ->
    {:ok, result} = VoskEx.Recognizer.result(recognizer)
    IO.inspect(result["text"])
  :continue ->
    {:ok, partial} = VoskEx.Recognizer.partial_result(recognizer)
    IO.inspect(partial["partial"])
end

Architecture

This module provides low-level bindings to the Vosk C API. For a more user-friendly interface, use VoskEx.Model and VoskEx.Recognizer.

The library uses a three-layer architecture:

  • Layer 1 (C NIF): Direct bindings to Vosk C API via erl_nif.h
  • Layer 2 (Low-Level): This module - thin Elixir wrapper with NIF stubs
  • Layer 3 (High-Level): VoskEx.Model and VoskEx.Recognizer - idiomatic Elixir API

Resources are automatically managed through BEAM garbage collection.

Summary

Functions

Process audio data (PCM 16-bit mono).

Create a recognizer for the given model and sample rate.

Check if a word exists in the model vocabulary.

Get final recognition result as JSON string.

Get partial recognition result as JSON string.

Get recognition result as JSON string.

Load a Vosk model from a directory path.

Reset the recognizer to start fresh.

Set the log level for Vosk/Kaldi messages.

Set maximum number of alternatives to return in results.

Enable/disable word timing information in partial results.

Enable/disable word timing information in results.

Functions

accept_waveform(recognizer_ref, audio_binary)

Process audio data (PCM 16-bit mono).

Returns:

  • 1: utterance ended (silence detected)
  • 0: continue processing
  • -1: error occurred

create_recognizer(model_ref, sample_rate)

Create a recognizer for the given model and sample rate.

Returns {:ok, recognizer_ref} or {:error, :recognizer_creation_failed}.

find_word(model_ref, word)

Check if a word exists in the model vocabulary.

Returns the word symbol (>= 0) if found, or -1 if not found.

get_final_result(recognizer_ref)

Get final recognition result as JSON string.

Call this at the end of the stream to flush remaining audio.

get_partial_result(recognizer_ref)

Get partial recognition result as JSON string.

This can be called while recognition is in progress.

get_result(recognizer_ref)

Get recognition result as JSON string.

Call this after accept_waveform returns 1.

load_model(path)

Load a Vosk model from a directory path.

Returns {:ok, model_ref} or {:error, :model_load_failed}.

load_nifs()

reset_recognizer(recognizer_ref)

Reset the recognizer to start fresh.

set_log_level(level)

Set the log level for Vosk/Kaldi messages.

  • -1: silent
  • 0: default logging
  • 0: more verbose

set_max_alternatives(recognizer_ref, max)

Set maximum number of alternatives to return in results.

set_partial_words(recognizer_ref, enabled)

Enable/disable word timing information in partial results.

set_words(recognizer_ref, enabled)

Enable/disable word timing information in results.