unicorn
A lightweight wrapper library around built-in Unicode normalization functions.
For details on Unicode normalization, see: Unicode Standard Annex #15: Unicode Normalization Forms
Types
Represents normalization forms accepted by the normalize
function.
pub type Form {
NFC
NFD
NFKC
NFKD
}
Constructors
-
NFC
Normalization Form Canonical Composition (Canonical Decomposition, followed by Canonical Composition)
-
NFD
Normalization Form Canonical Decomposition
-
NFKC
Normalization Form Compatibility Composition (Compatibility Decomposition, followed by Canonical Composition)
-
NFKD
Normalization Form Compatibility Decomposition
Values
pub fn normalize(s: String, form kind: Form) -> String
Normalizes a String
to the specified Unicode normalization form
.
Examples
// NFKC: fraction "¼" -> separate "1" + "⁄" + "4"
assert unicorn.normalize("¼", NFKC) == "1⁄4"
// NFKC: hangul conjoining jamo "ᄀ" + "ᅡ" + "ᆨ" -> single "각"
assert unicorn.normalize("ᄀ" <> "ᅡ" <> "ᆨ", NFKC) == "각"
// NFD: single "が" -> "か" with a combining dakuten
// Note that `form` can be passed with a label in an explicit manner
assert unicorn.normalize("が", form: NFD) == "か\u{3099}"
pub fn to_nfc(s: String) -> String
Converts a String
into its
canonical composition form (NFC).
Examples
// NFC: "e" with a combining acute -> single "é"
assert unicorn.to_nfc("e\u{0301}") == "é"
pub fn to_nfd(s: String) -> String
Converts a String
into its
canonical decomposition form (NFD).
Examples
// NFD: single "が" -> "か" with a combining dakuten
assert unicorn.to_nfd("が") == "か\u{3099}"
pub fn to_nfkc(s: String) -> String
Converts a String
into its
compatibility composition form (NFKC).
Examples
// NFKC: half-width "カ" + half-width dakuten -> full-width "ガ"
assert unicorn.to_nfkc("ガ") == "ガ"