# `Unicode.String.Dictionary`
[🔗](https://github.com/elixir-unicode/unicode_string/blob/v2.1.0/lib/unicode/dictionary.ex#L1)

Implements basic dictionary functions for dictionary-based
work break.

This implementation supports dictionary-based word breaking for:

* Chinese (`zh`, `zh-Hant`, `zh-Hans`, `zh-Hant-HK`, `yue`, `yue-Hans`) locales,
* Japanese (`ja`) using the same dictionary as for Chinese,
* Thai (`th`),
* Lao (`lo`),
* Khmer (`km`) and
* Burmese (`my`).

The dictionaries implemented are those used in the [CLDR](https://cldr.unicode.org) since
they are under an open source license and also for consistency with
[ICU](https://icu.unicode.org).

Note that these dictionaries need to be downloaded with
`mix unicode.string.download.dictionaries` prior to use. Each dictionary
will be parsed and loaded into [persistent_term](https://www.erlang.org/doc/man/persistent_term)
on demand. Note that each dictionary has a sizable memory footprint as measured
by `:persistent_term.info/0`:

| Dictionary  | Memory Mb   |
| ----------- | ----------: |
| Chinese     | 104.8       |
| Thai        | 9.6         |
| Lao         | 11.4        |
| Khmer       | 38.8        |
| Burmese     | 23.1        |

# `known_dictionary_locales`

Returns the locales that have a dictionary supporting
word breaking.

---

*Consult [api-reference.md](api-reference.md) for complete listing*