View Source Unicode.String.Dictionary (Unicode String v1.4.1)

Implements basic dictionary functions for dictionary-based work break.

This implementation supports dictionary-based word breaking for:

  • Chinese (zh, zh-Hant, zh-Hans, zh-Hant-HK, yue, yue-Hans) locales,
  • Japanese (ja) using the same dictionary as for Chinese,
  • Thai (th),
  • Lao (lo),
  • Khmer (km) and
  • Burmese (my).

The dictionaries implemented are those used in the CLDR since they are under an open source license and also for consistency with ICU.

Note that these dictionaries need to be downloaded with mix unicode.string.download.dictionaries prior to use. Each dictionary will be parsed and loaded into persistent_term on demand. Note that each dictionary has a sizable memory footprint as measured by :persistent_term.info/0:

DictionaryMemory Mb
Chinese104.8
Thai9.6
Lao11.4
Khmer38.8
Burmese23.1

Summary

Functions

Returns the locales that have a dictionary supporting word breaking.

Functions

Link to this function

known_dictionary_locales()

View Source

Returns the locales that have a dictionary supporting word breaking.