# `Text.Hyphenation`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/hyphenation.ex#L1)

Hyphenation via Liang's algorithm with TeX hyphenation patterns.

Returns the legal hyphenation points for a word — the positions
where a soft hyphen could be inserted for line-breaking, and where
a syllable boundary lies. This is the engine that backs both
hyphenation (`hyphenate/2`) and pattern-based syllable counting
(`count/2`).

### Bundled and on-demand language packs

Seven hyph-utf8 packs are bundled and loaded at compile time, so
the common European-language paths are zero-I/O:

* `en-us` — American English (~5,000 patterns plus DEK's exception list)
* `de-1996` — German (modern spelling)
* `fr` — French
* `es` — Spanish
* `it` — Italian
* `nl` — Dutch
* `pt` — Portuguese

Every other hyph-utf8 language pack (Russian, Hindi, and ~75
others) can be loaded on demand from the
[hyph-utf8](https://github.com/hyphenation/tex-hyphen) upstream.

On-demand loading goes through `Text.Data`, which means:

* Loaded language packs are cached under `:data_dir`
  (default `~/.cache/text/hyphenation/`) so subsequent calls do
  no I/O beyond the cache lookup.

* Auto-download from upstream is **opt-in**, gated by
  `config :text, auto_download_hyphenation_data: true`. Without
  that flag the package never reaches out to the network — calls
  for an unbundled language raise an `ArgumentError` explaining
  the situation.

* If you would rather populate the cache manually, drop the
  `hyph-<tag>.tex` file from upstream into the configured
  `:data_dir`/`hyphenation/` directory and it will be picked up
  without any download.

Bundled files live in `priv/hyphenation/`. Each `.tex` file ships
under its upstream license: `en-us`, `de-1996`, `fr`, `es`, `nl`,
and `pt` are MIT/X11/BSD; `it` is LPPL. All are compatible with
redistribution; the licenses (and original copyright notices)
are preserved verbatim in the headers of each file.

### Language input shapes

Every option that takes a `:language` accepts:

* an atom (`:fr`, `:"de-1996"`),

* a string (`"fr"`, `"en-GB"`, `"de-CH"`),

* or a `Localize.LanguageTag` struct (when the optional
  [`localize`](https://hex.pm/packages/localize) dependency is
  loaded). The `:language` and `:territory` fields are used to
  derive the upstream file tag.

The mapping prefers the most common variant when the input is
ambiguous: `:en` resolves to `en-us`, `:de` to the modern
spelling `de-1996`, `:el` to monotonic Greek, etc. Pass an
explicit BCP-47 tag (`"en-GB"`, `"de-CH"`, `"de-1901"`) to
override.

### Left and right minima

TeX hyphenation patterns are designed with minimum gaps at the
start and end of a word — `\lefthyphenmin` and `\righthyphenmin`.
The American English file recommends `left: 2`, `right: 3`. The
recommended values are read from the upstream `.tex` file's
header when available, and applied as defaults in `hyphenate/2`,
`points/2`, and `count/2`. They can be overridden per-call via
the `:left_min` and `:right_min` options.

# `count`

```elixir
@spec count(
  String.t(),
  keyword()
) :: pos_integer()
```

Returns the number of syllables in a word, using hyphenation
pattern boundaries as a proxy for syllable boundaries.

This is more accurate than the heuristic in `Text.Syllable.count/2`
for words that match the bundled patterns, but the two
occasionally disagree on edge cases. For readability metrics, use
`Text.Syllable`. For typographic syllable count, prefer this.

### Arguments

* `word` is a single word as a string.

### Options

* `:language`, `:left_min`, `:right_min` — see `points/2`.

### Returns

* A positive integer: the number of break points plus one.

### Examples

    iex> Text.Hyphenation.count("hyphenation")
    3

    iex> Text.Hyphenation.count("cat")
    1

# `hyphenate`

```elixir
@spec hyphenate(
  String.t(),
  keyword()
) :: String.t()
```

Inserts soft hyphens at every legal break point in a word.

### Arguments

* `word` is a single word as a string.

### Options

* `:hyphen` is the string inserted at each break point. Default is
  `"-"`. For typesetting use, `"\u00AD"` (soft hyphen) is common.

* `:language`, `:left_min`, `:right_min` — see `points/2`.

### Returns

* The word with the hyphen string inserted at every break point.

### Examples

    iex> Text.Hyphenation.hyphenate("hyphenation")
    "hy-phen-ation"

    iex> Text.Hyphenation.hyphenate("computer", hyphen: "·", right_min: 1)
    "com·put·er"

# `load_language`

```elixir
@spec load_language(atom()) :: :ok
```

Pre-loads a hyphenation pattern file for a language.

Calling this is **optional**. The first call to `points/2`,
`hyphenate/2`, or `count/2` for a given language already loads
(and if necessary downloads) the pattern file via `Text.Data` —
`load_language/1,2,3` is a way to warm the cache up front (e.g.
during application startup) so the first user-facing call has
no latency, or to register a custom pattern file under a name of
your choosing.

### Forms

    load_language(language)
    load_language(language, options)
    load_language(language, tex_path)
    load_language(language, tex_path, options)

Without an explicit `tex_path`, the upstream URL is derived from
the language input and the file is fetched via `Text.Data` (see
the moduledoc — auto-download must be enabled for the network
fetch to occur).

With an explicit `tex_path`, that file is read directly and
registered under `language`.

### Arguments

* `language` is an atom identifying the language (e.g. `:de`,
  `:"de-ch"`, `:my_custom_language`).

* `tex_path` is an optional path to a TeX hyphenation pattern
  file. When omitted, the file is resolved through `Text.Data`.

### Options

* `:left_min` is the minimum left hyphenation distance for the
  language. Default is parsed from the file's header, falling
  back to `2`.

* `:right_min` is the minimum right hyphenation distance for the
  language. Default is parsed from the file's header, falling
  back to `2`.

### Returns

* `:ok` on success.

# `load_language`

```elixir
@spec load_language(atom(), keyword() | Path.t()) :: :ok
```

# `load_language`

```elixir
@spec load_language(atom(), Path.t(), keyword()) :: :ok
```

# `points`

```elixir
@spec points(
  String.t(),
  keyword()
) :: [pos_integer()]
```

Returns the list of valid hyphenation points within a word.

A hyphenation point is the number of characters from the start of
the word at which a hyphen is permitted. For example, the points
of `"hyphenation"` are `[2, 5, 7]`, indicating breaks `hy-phen-a-tion`
(two characters from the left is the first break, and so on).

### Arguments

* `word` is a single word as a string. Surrounding punctuation is
  not stripped — pass tokens that have already been split.

### Options

* `:language` is the hyphenation language. The default is `:en`
  (American English). Other languages must first be loaded with
  `load_language/2`.

* `:left_min` is the minimum number of characters at the start of
  the word before the first allowed break. Default is the
  language's recommended value (`2` for English).

* `:right_min` is the minimum number of characters at the end of
  the word after the last allowed break. Default is the
  language's recommended value (`3` for English).

### Returns

* A list of integer hyphenation points, in ascending order. An
  empty list means no legal break points exist (very short words,
  or words below the left/right minima).

### Examples

    iex> Text.Hyphenation.points("hyphenation")
    [2, 6]

    iex> Text.Hyphenation.points("computer")
    [3]

    iex> Text.Hyphenation.points("a")
    []

---

*Consult [api-reference.md](api-reference.md) for complete listing*
