Unicode NFD normalization for collation.
Delegates to Erlang's :unicode module.
Summary
Functions
Normalize a string to NFD (Canonical Decomposition) form.
Optionally normalize a string and convert it to a list of integer codepoints.
Convert a string to a list of integer codepoints.
Functions
Normalize a string to NFD (Canonical Decomposition) form.
Uses Erlang's :unicode.characters_to_nfd_binary/1 followed by a canonical
reordering pass using the unicode package's CCC data to correct ordering
for newer Unicode codepoints.
Arguments
string- a UTF-8 binary string.
Returns
The NFD-normalized string as a UTF-8 binary.
Examples
iex> "café" |> Localize.Collation.Normalizer.nfd() |> String.to_charlist() |> length()
5
iex> Localize.Collation.Normalizer.nfd("é")
"é"
@spec normalize_to_codepoints(String.t(), boolean()) :: [non_neg_integer()]
Optionally normalize a string and convert it to a list of integer codepoints.
Arguments
string- a UTF-8 binary string.normalize?- whether to apply NFD normalization first (default:false).
Returns
A list of integer codepoints, optionally NFD-normalized.
Examples
iex> Localize.Collation.Normalizer.normalize_to_codepoints("abc")
[97, 98, 99]
iex> Localize.Collation.Normalizer.normalize_to_codepoints("café", true)
[99, 97, 102, 101, 769]
@spec to_codepoints(String.t()) :: [non_neg_integer(), ...]
Convert a string to a list of integer codepoints.
Arguments
string- a UTF-8 binary string.
Returns
A list of integer codepoints.
Examples
iex> Localize.Collation.Normalizer.to_codepoints("abc")
[97, 98, 99]