OpenJTalk (open_jtalk_elixir v0.3.0)

View Source

Use Open JTalk from Elixir. This package builds a local open_jtalk CLI and, by default, bundles a UTF-8 dictionary and an HTS voice (you can disable this), exposing convenient functions:

Install

Add the dependency to your mix.exs:

def deps do
  [
    {:open_jtalk_elixir, "~> 0.3"}
  ]
end

Then:

mix deps.get
mix compile

On first compile the project may download and build MeCab, HTS Engine API, and Open JTalk. By default it also downloads and bundles a UTF-8 dictionary and a Mei voice into priv/ (you can turn this off with OPENJTALK_BUNDLE_ASSETS=0).

Build requirements

You’ll need common build tools: gcc/g++, make, curl, tar, unzip. On macOS Xcode Command Line Tools are sufficient.

Optional environment flags (honored by the Makefile):

  • OPENJTALK_FULL_STATIC=1 — attempt a fully static open_jtalk (Linux only; requires static libstdc++)
  • OPENJTALK_BUNDLE_ASSETS=0|1 — whether to bundle dictionary/voice into priv/

Tested platforms

Host builds (compile and run on the same machine):

  • Linux x86_64
  • Linux aarch64
  • macOS 14 (arm64, Apple Silicon)

Cross-compile (host → target):

  • Linux x86_64 → Nerves rpi4 (aarch64)

Quick start

# play via system audio player (aplay/paplay/afplay/play)
OpenJTalk.say("元氣ですかあ 、元氣が有れば、なんでもできる")

Options

All synthesis calls accept the same options (values are clamped):

  • :timbre — voice color offset -0.8..0.8 (default 0.0)
  • :pitch_shift — semitones -24..24 (default 0)
  • :rate — speaking speed 0.5..2.0 (default 1.0)
  • :gain — output gain in dB (default 0)
  • :voice — path to a .htsvoice file (optional)
  • :dictionary — path to a directory containing sys.dic (optional)
  • :timeout — max runtime in ms (default 20_000)
  • :out — output WAV path (only for to_wav_file/2)

Concatenate WAVs

You can combine multiple WAVs (same format: channels/rate/bit depth/etc.) into one:

{:ok, a} = OpenJTalk.to_wav_binary("これは一つ目。")
{:ok, b} = OpenJTalk.to_wav_binary("これは二つ目。")
{:ok, c} = OpenJTalk.to_wav_binary("これは三つ目。")

{:ok, merged} = OpenJTalk.Wav.concat_binaries([a, b, c])
# or from files:
# {:ok, merged} = OpenJTalk.Wav.concat_files(["a.wav", "b.wav", "c.wav"])

Summary

Types

Output gain in dB. Typical useful range is about -20..20 (values are clamped).

Entry describing a component path and where it came from.

Return type of info/0.

Pitch shift in semitones. Range: -24..24 (values are clamped).

Audio playback mode

Options accepted by playback functions.

Speaking rate multiplier. Range: 0.5..2.0 (values are clamped).

Options accepted by say/2 (synth + playback + optional :out).

Options accepted by synthesis functions.

Voice color adjustment. Range: -0.8..0.8 (values are clamped).

Functions

Return useful information about the local Open J Talk setup.

Play RIFF/WAV bytes already in memory (no temp files).

Play a WAV from a file path. See play_wav_binary/2 for options.

Synthesize text with Open JTalk and play it.

Synthesize text and return RIFF/WAV bytes.

Synthesize text to a WAV file. Respects :out when provided; otherwise creates a unique path in the system temp dir.

Validate options for synthesis and playback.

Types

gain()

@type gain() :: number()

Output gain in dB. Typical useful range is about -20..20 (values are clamped).

info_entry()

@type info_entry() :: %{
  path: String.t() | nil,
  source: :env | :bundled | :system | :none
}

Entry describing a component path and where it came from.

info_map()

@type info_map() :: %{
  bin: info_entry(),
  dictionary: info_entry(),
  voice: info_entry(),
  audio_player: info_entry()
}

Return type of info/0.

pitch_shift()

@type pitch_shift() :: -24..24

Pitch shift in semitones. Range: -24..24 (values are clamped).

playback_mode()

@type playback_mode() :: :auto | :file | :stdin

Audio playback mode:

  • :auto — prefer stdin when available; otherwise fall back to file playback
  • :file — always use file-based playback
  • :stdin — stream WAV bytes via stdin (diskless); falls back to file if unsupported

player_option()

@type player_option() ::
  {:timeout, non_neg_integer()} | {:playback_mode, playback_mode()}

Options accepted by playback functions.

rate()

@type rate() :: float()

Speaking rate multiplier. Range: 0.5..2.0 (values are clamped).

say_option()

@type say_option() :: player_option() | synth_option() | {:out, Path.t()}

Options accepted by say/2 (synth + playback + optional :out).

synth_option()

@type synth_option() ::
  {:timbre, timbre()}
  | {:pitch_shift, pitch_shift()}
  | {:rate, rate()}
  | {:gain, gain()}
  | {:voice, Path.t()}
  | {:dictionary, Path.t()}
  | {:timeout, non_neg_integer()}

Options accepted by synthesis functions.

timbre()

@type timbre() :: float()

Voice color adjustment. Range: -0.8..0.8 (values are clamped).

Functions

info()

@spec info() :: {:ok, info_map()}

Return useful information about the local Open J Talk setup.

play_wav_binary(wav_bytes, opts \\ [])

@spec play_wav_binary(iodata(), [player_option()]) :: :ok | {:error, term()}

Play RIFF/WAV bytes already in memory (no temp files).

Accepts the same :playback_mode and :timeout options as say/2. Use playback_mode: :stdin for diskless playback when a stdin-capable player is available.

play_wav_file(path, opts \\ [])

@spec play_wav_file(Path.t(), [player_option()]) :: :ok | {:error, term()}

Play a WAV from a file path. See play_wav_binary/2 for options.

say(text, opts \\ [])

@spec say(binary(), [say_option()]) :: :ok | {:error, term()}

Synthesize text with Open JTalk and play it.

:playback_mode controls how playback occurs:

  • :auto (default) tries stdin first, then falls back to file playback.

to_wav_binary(text, opts \\ [])

@spec to_wav_binary(binary(), [synth_option()]) :: {:ok, binary()} | {:error, term()}

Synthesize text and return RIFF/WAV bytes.

to_wav_file(text, opts \\ [])

@spec to_wav_file(binary(), [synth_option()]) :: {:ok, Path.t()} | {:error, term()}

Synthesize text to a WAV file. Respects :out when provided; otherwise creates a unique path in the system temp dir.

validate_options!(opts)

@spec validate_options!(keyword()) :: keyword()

Validate options for synthesis and playback.

Allowed keys:

  • Synthesis: :timbre, :pitch_shift, :rate, :gain, :voice, :dictionary, :timeout
  • Playback: :playback_mode, :timeout
  • Files: :out

Enforcement:

  • Unknown keys raise ArgumentError
  • :playback_mode must be one of :auto | :file | :stdin (if present)

  • :timeout must be a non-negative integer (if present)

Returns the original opts on success.