OmnivoiceEx (omnivoice_ex v0.1.0)

Elixir wrapper for OmniVoice — a unified speech generation model from K2-FSA.

Voice Cloning · Voice Design (instruction-based) · Multilingual · 24kHz output.

Features

🎤 Voice Cloning — Clone any voice from a reference audio clip
🎨 Voice Design — Describe a voice in natural language and generate it
🌍 Multilingual — Supports multiple languages with automatic detection
⚡ Fast Inference — Optimized for GPU (CUDA/MPS) and CPU
🔊 24kHz WAV Output — Studio-quality audio

Protocol

OmnivoiceEx uses MessagePack over binary-framed Erlang Ports. Audio is transmitted as WAV bytes inside msgpack — no base64 overhead.

Quick Start

{:ok, pid} = OmnivoiceEx.start_link(device: "cuda")
:ok = OmnivoiceEx.await_ready(pid)
{:ok, audio} = OmnivoiceEx.generate(pid, "Hello, world!")
:ok = OmnivoiceEx.save(audio, "output.wav")

Voice Design (instruction-based)

{:ok, audio} = OmnivoiceEx.generate(pid, "Welcome to our service!",
  instruct: "A warm, professional female broadcaster"
)

Voice Cloning

{:ok, audio} = OmnivoiceEx.generate(pid, "This is my voice clone!",
  ref_audio: "/path/to/reference.wav",
  ref_text: "The transcript of the reference audio"
)

Requirements

Python ≥ 3.10, omnivoice + msgpack + numpy + soundfile pip packages
CUDA GPU, Apple Silicon (MPS), or CPU
Elixir ≥ 1.14

Installation

# mix.exs
{:omnivoice_ex, "~> 0.1.0"}

# Install Python deps
mix omnivoice_ex.setup

Summary

Types

audio()

generate_opt()

Functions

await_ready(server, timeout \\ 120_000)

Waits for the model to finish loading. Returns :ok when ready.

generate(server, text, opts \\ [])

Generates speech audio from text. Returns {:ok, audio_wav}.

generate(server, text, opts, timeout)

See OmnivoiceEx.Server.generate/4.

info(server)

Returns runtime model information.

save(audio, path)

Saves audio binary to a WAV file.

start_link(opts)

Starts an OmniVoice model server.

stop(server)

Gracefully stops the server and Python bridge.

Types

audio()

@type audio() :: binary()

generate_opt()

@type generate_opt() ::
  {:ref_audio, String.t()}
  | {:ref_text, String.t()}
  | {:instruct, String.t()}
  | {:language, String.t()}
  | {:duration, float()}
  | {:speed, float()}
  | {:num_step, pos_integer()}
  | {:guidance_scale, float()}

Functions

await_ready(server, timeout \\ 120_000)

@spec await_ready(GenServer.server(), timeout()) :: :ok | {:error, term()}

Waits for the model to finish loading. Returns :ok when ready.

generate(server, text, opts \\ [])

@spec generate(GenServer.server(), String.t(), [generate_opt()]) ::
  {:ok, audio()} | {:error, term()}

Generates speech audio from text. Returns {:ok, audio_wav}.

Options

:ref_audio — Path to reference audio for voice cloning
:ref_text — Transcript of reference audio (improves clone quality)
:instruct — Voice instruction for design (e.g. "A warm female broadcaster")
:language — Language code hint (auto-detected if omitted)
:duration — Target duration in seconds
:speed — Playback speed factor
:num_step — Diffusion steps (higher = better quality, slower). Default: 32
:guidance_scale — Classifier-free guidance. Default: 2.0

Examples

# Basic
{:ok, audio} = OmnivoiceEx.generate(pid, "Hello!")
:ok = OmnivoiceEx.save(audio, "out.wav")

# Voice Design
{:ok, audio} = OmnivoiceEx.generate(pid,
  "Welcome to the show!",
  instruct: "A deep, authoritative male narrator"
)

# Voice Cloning
{:ok, audio} = OmnivoiceEx.generate(pid, "Hello in my voice!",
  ref_audio: "/path/to/ref.wav",
  ref_text: "This is my reference transcript"
)

# Quality tuning
{:ok, audio} = OmnivoiceEx.generate(pid, "High quality speech.",
  num_step: 64, guidance_scale: 3.0
)

generate(server, text, opts, timeout)

@spec generate(GenServer.server(), String.t(), [generate_opt()], timeout()) ::
  {:ok, audio()} | {:error, term()}

See OmnivoiceEx.Server.generate/4.

info(server)

@spec info(GenServer.server()) :: map()

Returns runtime model information.

save(audio, path)

@spec save(audio(), Path.t()) :: :ok | {:error, term()}

Saves audio binary to a WAV file.

start_link(opts)

@spec start_link(OmnivoiceEx.Server.start_opts()) :: GenServer.on_start()

Starts an OmniVoice model server.

Options

:model — HuggingFace model ID. Default: "k2-fsa/OmniVoice"
:device — "cuda", "cpu", "mps". Default: "cuda"
:dtype — "float16", "float32", "bfloat16". Default: "float16"
:name — Optional GenServer name

stop(server)

@spec stop(GenServer.server()) :: :ok

Gracefully stops the server and Python bridge.