View Source Unicode.String.Dictionary (Unicode String v1.4.1)
Implements basic dictionary functions for dictionary-based work break.
This implementation supports dictionary-based word breaking for:
- Chinese (
zh
,zh-Hant
,zh-Hans
,zh-Hant-HK
,yue
,yue-Hans
) locales, - Japanese (
ja
) using the same dictionary as for Chinese, - Thai (
th
), - Lao (
lo
), - Khmer (
km
) and - Burmese (
my
).
The dictionaries implemented are those used in the CLDR since they are under an open source license and also for consistency with ICU.
Note that these dictionaries need to be downloaded with
mix unicode.string.download.dictionaries
prior to use. Each dictionary
will be parsed and loaded into persistent_term
on demand. Note that each dictionary has a sizable memory footprint as measured
by :persistent_term.info/0
:
Dictionary | Memory Mb |
---|---|
Chinese | 104.8 |
Thai | 9.6 |
Lao | 11.4 |
Khmer | 38.8 |
Burmese | 23.1 |
Summary
Functions
Returns the locales that have a dictionary supporting word breaking.
Functions
Returns the locales that have a dictionary supporting word breaking.