unicorn

A lightweight wrapper library around built-in Unicode normalization functions.

For details on Unicode normalization, see: Unicode Standard Annex #15: Unicode Normalization Forms

Types

Represents normalization forms accepted by the normalize function.

pub type Form {
  NFC
  NFD
  NFKC
  NFKD
}

Constructors

  • NFC

    Normalization Form Canonical Composition (Canonical Decomposition, followed by Canonical Composition)

  • NFD

    Normalization Form Canonical Decomposition

  • NFKC

    Normalization Form Compatibility Composition (Compatibility Decomposition, followed by Canonical Composition)

  • NFKD

    Normalization Form Compatibility Decomposition

Values

pub fn normalize(s: String, form kind: Form) -> String

Normalizes a String to the specified Unicode normalization form.

Examples

// NFKC: fraction "¼" -> separate "1" + "⁄" + "4"
assert unicorn.normalize("¼", NFKC) == "1⁄4"
// NFKC: hangul conjoining jamo "ᄀ" + "ᅡ" + "ᆨ" -> single "각"
assert unicorn.normalize("ᄀ" <> "ᅡ" <> "ᆨ", NFKC) == "각"
// NFD: single "が" -> "か" with a combining dakuten
// Note that `form` can be passed with a label in an explicit manner
assert unicorn.normalize("が", form: NFD) == "か\u{3099}"
pub fn to_nfc(s: String) -> String

Converts a String into its canonical composition form (NFC).

Examples

// NFC: "e" with a combining acute -> single "é"
assert unicorn.to_nfc("e\u{0301}") == "é"
pub fn to_nfd(s: String) -> String

Converts a String into its canonical decomposition form (NFD).

Examples

// NFD: single "が" -> "か" with a combining dakuten
assert unicorn.to_nfd("が") == "か\u{3099}"
pub fn to_nfkc(s: String) -> String

Converts a String into its compatibility composition form (NFKC).

Examples

// NFKC: half-width "カ" + half-width dakuten -> full-width "ガ"
assert unicorn.to_nfkc("ガ") == "ガ"
pub fn to_nfkd(s: String) -> String

Converts a String into its compatibility decomposition form (NFKD).

Examples

// NFKD: single ḕ -> "e" + macron + grave
assert unicorn.to_nfkd("ḕ") == "e\u{0304}\u{0300}"
Search Document