Hyphenation via Liang's algorithm with TeX hyphenation patterns.
Returns the legal hyphenation points for a word — the positions
where a soft hyphen could be inserted for line-breaking, and where
a syllable boundary lies. This is the engine that backs both
hyphenation (hyphenate/2) and pattern-based syllable counting
(count/2).
Bundled and on-demand language packs
Seven hyph-utf8 packs are bundled and loaded at compile time, so the common European-language paths are zero-I/O:
en-us— American English (~5,000 patterns plus DEK's exception list)de-1996— German (modern spelling)fr— Frenches— Spanishit— Italiannl— Dutchpt— Portuguese
Every other hyph-utf8 language pack (Russian, Hindi, and ~75 others) can be loaded on demand from the hyph-utf8 upstream.
On-demand loading goes through Text.Data, which means:
Loaded language packs are cached under
:data_dir(default~/.cache/text/hyphenation/) so subsequent calls do no I/O beyond the cache lookup.Auto-download from upstream is opt-in, gated by
config :text, auto_download_hyphenation_data: true. Without that flag the package never reaches out to the network — calls for an unbundled language raise anArgumentErrorexplaining the situation.If you would rather populate the cache manually, drop the
hyph-<tag>.texfile from upstream into the configured:data_dir/hyphenation/directory and it will be picked up without any download.
Bundled files live in priv/hyphenation/. Each .tex file ships
under its upstream license: en-us, de-1996, fr, es, nl,
and pt are MIT/X11/BSD; it is LPPL. All are compatible with
redistribution; the licenses (and original copyright notices)
are preserved verbatim in the headers of each file.
Language input shapes
Every option that takes a :language accepts:
an atom (
:fr,:"de-1996"),a string (
"fr","en-GB","de-CH"),or a
Localize.LanguageTagstruct (when the optionallocalizedependency is loaded). The:languageand:territoryfields are used to derive the upstream file tag.
The mapping prefers the most common variant when the input is
ambiguous: :en resolves to en-us, :de to the modern
spelling de-1996, :el to monotonic Greek, etc. Pass an
explicit BCP-47 tag ("en-GB", "de-CH", "de-1901") to
override.
Left and right minima
TeX hyphenation patterns are designed with minimum gaps at the
start and end of a word — \lefthyphenmin and \righthyphenmin.
The American English file recommends left: 2, right: 3. The
recommended values are read from the upstream .tex file's
header when available, and applied as defaults in hyphenate/2,
points/2, and count/2. They can be overridden per-call via
the :left_min and :right_min options.
Summary
Functions
Returns the number of syllables in a word, using hyphenation pattern boundaries as a proxy for syllable boundaries.
Inserts soft hyphens at every legal break point in a word.
Pre-loads a hyphenation pattern file for a language.
Returns the list of valid hyphenation points within a word.
Functions
@spec count( String.t(), keyword() ) :: pos_integer()
Returns the number of syllables in a word, using hyphenation pattern boundaries as a proxy for syllable boundaries.
This is more accurate than the heuristic in Text.Syllable.count/2
for words that match the bundled patterns, but the two
occasionally disagree on edge cases. For readability metrics, use
Text.Syllable. For typographic syllable count, prefer this.
Arguments
wordis a single word as a string.
Options
:language,:left_min,:right_min— seepoints/2.
Returns
- A positive integer: the number of break points plus one.
Examples
iex> Text.Hyphenation.count("hyphenation")
3
iex> Text.Hyphenation.count("cat")
1
Inserts soft hyphens at every legal break point in a word.
Arguments
wordis a single word as a string.
Options
:hyphenis the string inserted at each break point. Default is"-". For typesetting use,"\u00AD"(soft hyphen) is common.:language,:left_min,:right_min— seepoints/2.
Returns
- The word with the hyphen string inserted at every break point.
Examples
iex> Text.Hyphenation.hyphenate("hyphenation")
"hy-phen-ation"
iex> Text.Hyphenation.hyphenate("computer", hyphen: "·", right_min: 1)
"com·put·er"
@spec load_language(atom()) :: :ok
Pre-loads a hyphenation pattern file for a language.
Calling this is optional. The first call to points/2,
hyphenate/2, or count/2 for a given language already loads
(and if necessary downloads) the pattern file via Text.Data —
load_language/1,2,3 is a way to warm the cache up front (e.g.
during application startup) so the first user-facing call has
no latency, or to register a custom pattern file under a name of
your choosing.
Forms
load_language(language)
load_language(language, options)
load_language(language, tex_path)
load_language(language, tex_path, options)Without an explicit tex_path, the upstream URL is derived from
the language input and the file is fetched via Text.Data (see
the moduledoc — auto-download must be enabled for the network
fetch to occur).
With an explicit tex_path, that file is read directly and
registered under language.
Arguments
languageis an atom identifying the language (e.g.:de,:"de-ch",:my_custom_language).tex_pathis an optional path to a TeX hyphenation pattern file. When omitted, the file is resolved throughText.Data.
Options
:left_minis the minimum left hyphenation distance for the language. Default is parsed from the file's header, falling back to2.:right_minis the minimum right hyphenation distance for the language. Default is parsed from the file's header, falling back to2.
Returns
:okon success.
@spec points( String.t(), keyword() ) :: [pos_integer()]
Returns the list of valid hyphenation points within a word.
A hyphenation point is the number of characters from the start of
the word at which a hyphen is permitted. For example, the points
of "hyphenation" are [2, 5, 7], indicating breaks hy-phen-a-tion
(two characters from the left is the first break, and so on).
Arguments
wordis a single word as a string. Surrounding punctuation is not stripped — pass tokens that have already been split.
Options
:languageis the hyphenation language. The default is:en(American English). Other languages must first be loaded withload_language/2.:left_minis the minimum number of characters at the start of the word before the first allowed break. Default is the language's recommended value (2for English).:right_minis the minimum number of characters at the end of the word after the last allowed break. Default is the language's recommended value (3for English).
Returns
- A list of integer hyphenation points, in ascending order. An empty list means no legal break points exist (very short words, or words below the left/right minima).
Examples
iex> Text.Hyphenation.points("hyphenation")
[2, 6]
iex> Text.Hyphenation.points("computer")
[3]
iex> Text.Hyphenation.points("a")
[]