open_jtalk_elixir

Use Open JTalk from Elixir. This package builds a local open_jtalk CLI and, by default, bundles a UTF-8 dictionary and an HTS voice (you can disable this), exposing three convenient APIs:

OpenJTalk.to_wav/2 — synthesize text to a WAV file
OpenJTalk.to_binary/2 — synthesize and return WAV bytes
OpenJTalk.say/2 — synthesize and play via a system audio player

Install

Add the dependency to your mix.exs:

def deps do
  [
    {:open_jtalk_elixir, "~> 0.2"}
  ]
end

Then:

mix deps.get
mix compile

On first compile the project may download and build MeCab, HTS Engine API, and Open JTalk. By default it also downloads and bundles a UTF-8 dictionary and a Mei voice into priv/ (you can turn this off with OPENJTALK_BUNDLE_ASSETS=0).

Build requirements

You’ll need common build tools: gcc/g++, make, curl, tar, unzip. On macOS Xcode Command Line Tools are sufficient.

Optional environment flags (honored by the Makefile):

OPENJTALK_FULL_STATIC=1 — attempt a fully static open_jtalk (Linux only; requires static libstdc++)
OPENJTALK_BUNDLE_ASSETS=0|1 — whether to bundle dictionary/voice into priv/

Tested platforms

Host builds (compile and run on the same machine):

Linux x86_64
Linux aarch64
macOS 14 (arm64, Apple Silicon)

Cross-compile (host → target):

Linux x86_64 → Nerves rpi4 (aarch64)

Quick start

# play via system audio player (aplay/paplay/afplay/play)
OpenJTalk.say("元氣ですかあ 、元氣が有れば、なんでもできる")

Options

All synthesis calls accept the same options (values are clamped):

:timbre — voice color offset -0.8..0.8 (default 0.0)
:pitch_shift — semitones -24..24 (default 0)
:rate — speaking speed 0.5..2.0 (default 1.0)
:gain — output gain in dB (default 0)
:voice — path to a .htsvoice file (optional)
:dictionary — path to a directory containing sys.dic (optional)
:timeout — max runtime in ms (default 20_000)
:out — output WAV path (only for to_wav/2)

How asset resolution works

The package resolves required assets in this order:

Environment variable override
Bundled asset in priv/
System-installed location

CLI binary (`open_jtalk`)

Env: OPENJTALK_CLI — full path to open_jtalk.
Bundled: priv/bin/open_jtalk (built during compile).
System: open_jtalk found on $PATH.

Dictionary (`sys.dic`)

Env: OPENJTALK_DIC_DIR — directory containing sys.dic.
Bundled: priv/dic/sys.dic or any priv/dic/**/sys.dic (e.g. naist-jdic).
System: common locations such as /var/lib/mecab/dic/open-jtalk/naist-jdic, /usr/lib/*/mecab/dic/open-jtalk/naist-jdic, etc.

Voice (`.htsvoice`)

Env: OPENJTALK_VOICE — path to a .htsvoice file.
Bundled: first file matching priv/voices/**/*.htsvoice.
System: standard locations like /usr/share/hts-voice/** or /usr/local/share/hts-voice/**.

If you change environment variables at runtime (or move files), refresh the cached paths:

OpenJTalk.Assets.reset_cache()

Using with Nerves

This library is Nerves-aware. When MIX_TARGET is set the build defaults to:

OPENJTALK_FULL_STATIC=1 — try to statically link the CLI on Linux targets when possible
OPENJTALK_BUNDLE_ASSETS=1 — bundle CLI, dictionary, and voice into priv/

So for many projects no extra configuration is needed.

Quick Nerves flow

export MIX_TARGET=rpi4
mix deps.get
mix compile
mix firmware

On the device:

{:ok, info} = OpenJTalk.info()
# bundled assets should show up as :bundled

OpenJTalk.say("こんにちは")

Audio on Nerves

OpenJTalk.say/2 requires a system audio player. Most Nerves images use ALSA aplay. If your image does not include a player:

add one to the system image, or
use OpenJTalk.to_wav/2 and play the WAV with your chosen mechanism.

Firmware size notes

Bundling the full dictionary + voice + binary increases firmware size. Approximate (uncompressed) sizes:

Dictionary (NAIST-JDIC): ~100–110 MB
Mei voice: ~2.2 MB
CLI binary: ~0.7 MB

If that’s too large you can avoid bundling at compile time and provision assets separately (rootfs overlay, /data, OTA, etc.):

MIX_TARGET=rpi4 OPENJTALK_BUNDLE_ASSETS=0 mix deps.compile open_jtalk_elixir

Then point the library to the provisioned assets (for example in config/runtime.exs):

System.put_env("OPENJTALK_CLI",     "/data/open_jtalk/bin/open_jtalk")
System.put_env("OPENJTALK_DIC_DIR", "/data/open_jtalk/dic")
System.put_env("OPENJTALK_VOICE",   "/data/open_jtalk/voices/mei_normal.htsvoice")

OpenJTalk.Assets.reset_cache()

How you provision those files into your image is outside the scope of this library.

Third-party components & licenses

This package does not redistribute third-party assets by default. At compile time it may download and build:

Open JTalk 1.11 — Modified BSD (BSD 3-Clause)
Source: http://open-jtalk.sourceforge.net/
HTS Engine API 1.10 — Modified BSD (BSD 3-Clause)
Source: http://hts-engine.sourceforge.net/
MeCab 0.996 — tri-licensed (GPL / LGPL / BSD); this project uses the BSD terms
Source: https://taku910.github.io/mecab/
Open JTalk Dictionary (NAIST-JDIC UTF-8) 1.11 — BSD-style by NAIST
Source: https://sourceforge.net/projects/open-jtalk/files/Dictionary/
HTS Voice “Mei” (MMDAgent_Example 1.8) — CC BY 3.0
Source: https://sourceforge.net/projects/mmdagent/files/MMDAgent_Example/
Attribution: “HTS Voice ‘Mei’ © Nagoya Institute of Technology, licensed CC BY 3.0.”