OpenJTalk (open_jtalk_elixir v0.2.0)

View Source

Use Open JTalk from Elixir. This package builds a local open_jtalk CLI and, by default, bundles a UTF-8 dictionary and an HTS voice (you can disable this), exposing three convenient APIs:

Install

Add the dependency to your mix.exs:

def deps do
  [
    {:open_jtalk_elixir, "~> 0.2"}
  ]
end

Then:

mix deps.get
mix compile

On first compile the project may download and build MeCab, HTS Engine API, and Open JTalk. By default it also downloads and bundles a UTF-8 dictionary and a Mei voice into priv/ (you can turn this off with OPENJTALK_BUNDLE_ASSETS=0).

Build requirements

You’ll need common build tools: gcc/g++, make, curl, tar, unzip. On macOS Xcode Command Line Tools are sufficient.

Optional environment flags (honored by the Makefile):

  • OPENJTALK_FULL_STATIC=1 — attempt a fully static open_jtalk (Linux only; requires static libstdc++)
  • OPENJTALK_BUNDLE_ASSETS=0|1 — whether to bundle dictionary/voice into priv/

Tested platforms

Host builds (compile and run on the same machine):

  • Linux x86_64
  • Linux aarch64
  • macOS 14 (arm64, Apple Silicon)

Cross-compile (host → target):

  • Linux x86_64 → Nerves rpi4 (aarch64)

Quick start

# play via system audio player (aplay/paplay/afplay/play)
OpenJTalk.say("元氣ですかあ 、元氣が有れば、なんでもできる")

Options

All synthesis calls accept the same options (values are clamped):

  • :timbre — voice color offset -0.8..0.8 (default 0.0)
  • :pitch_shift — semitones -24..24 (default 0)
  • :rate — speaking speed 0.5..2.0 (default 1.0)
  • :gain — output gain in dB (default 0)
  • :voice — path to a .htsvoice file (optional)
  • :dictionary — path to a directory containing sys.dic (optional)
  • :timeout — max runtime in ms (default 20_000)
  • :out — output WAV path (only for to_wav/2)

Summary

Types

Output gain in dB. Typical useful range is about -20..20 (values are clamped).

Pitch shift in semitones. Range: -24..24 (values are clamped).

Speaking rate multiplier. Range: 0.5..2.0 (values are clamped).

Voice color adjustment. Range: -0.8..0.8 (values are clamped).

Functions

Return useful information about the local Open JTalk setup.

Synthesize text and play it via a system audio player.

Synthesize text and return a WAV as a binary.

Synthesize text to a WAV file.

Types

gain()

@type gain() :: number()

Output gain in dB. Typical useful range is about -20..20 (values are clamped).

opt()

@type opt() ::
  {:timbre, timbre()}
  | {:pitch_shift, pitch_shift()}
  | {:rate, rate()}
  | {:gain, gain()}
  | {:voice, Path.t()}
  | {:dictionary, Path.t()}
  | {:timeout, non_neg_integer()}
  | {:out, Path.t()}

opts()

@type opts() :: [opt()]

pitch_shift()

@type pitch_shift() :: -24..24

Pitch shift in semitones. Range: -24..24 (values are clamped).

rate()

@type rate() :: float()

Speaking rate multiplier. Range: 0.5..2.0 (values are clamped).

timbre()

@type timbre() :: float()

Voice color adjustment. Range: -0.8..0.8 (values are clamped).

Functions

info()

@spec info() :: {:ok, map()} | {:error, term()}

Return useful information about the local Open JTalk setup.

say(text, opts \\ [])

@spec say(binary(), opts()) :: :ok | {:error, term()}

Synthesize text and play it via a system audio player.

to_binary(text, opts \\ [])

@spec to_binary(binary(), opts()) :: {:ok, binary()} | {:error, term()}

Synthesize text and return a WAV as a binary.

to_wav(text, opts \\ [])

@spec to_wav(binary(), opts()) :: {:ok, Path.t()} | {:error, term()}

Synthesize text to a WAV file.