Localize.Collation.ImplicitWeights (Localize v0.38.0)

Copy Markdown View Source

Computes implicit collation elements for codepoints not in the DUCET/CLDR allkeys table.

The UCA defines an algorithm for computing implicit weights for:

  • CJK Unified Ideographs (Han characters).

  • Hangul syllables (decomposed algorithmically).

  • Unassigned codepoints.

See UTS #10 Section 10.1 for the implicit weight computation.

Summary

Functions

Compute implicit collation elements for a codepoint not in the allkeys table.

Decompose a Hangul syllable into its constituent jamo codepoints.

Check if a codepoint is a Hangul syllable.

Check if a codepoint is a CJK Unified Ideograph.

Functions

compute(cp)

@spec compute(non_neg_integer()) ::
  {:hangul_decompose, [non_neg_integer()]} | [Localize.Collation.Element.t()]

Compute implicit collation elements for a codepoint not in the allkeys table.

Arguments

  • cp - an integer codepoint.

Returns

  • {:hangul_decompose, jamo} - for Hangul syllables.

  • [element, element] - two implicit CEs for CJK or unassigned codepoints.

Examples

iex> [ce1, ce2] = Localize.Collation.ImplicitWeights.compute(0x4E00)
iex> Localize.Collation.Element.primary(ce1) >= 0xFB40
true
iex> Localize.Collation.Element.secondary(ce2)
0

decompose_hangul_to_jamo(cp)

@spec decompose_hangul_to_jamo(non_neg_integer()) :: [non_neg_integer()]

Decompose a Hangul syllable into its constituent jamo codepoints.

Arguments

  • cp - an integer codepoint for a Hangul syllable (U+AC00..U+D7A3).

Returns

A list of 2 or 3 jamo codepoints: [lead, vowel] or [lead, vowel, trail].

Examples

iex> Localize.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC00)
[0x1100, 0x1161]

iex> Localize.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC01)
[0x1100, 0x1161, 0x11A8]

hangul_syllable?(cp)

@spec hangul_syllable?(non_neg_integer()) :: boolean()

Check if a codepoint is a Hangul syllable.

Arguments

  • cp - an integer codepoint.

Returns

  • true if the codepoint is a Hangul syllable.

  • false otherwise.

Examples

iex> Localize.Collation.ImplicitWeights.hangul_syllable?(0xAC00)
true

iex> Localize.Collation.ImplicitWeights.hangul_syllable?(0x0041)
false

unified_ideograph?(cp)

@spec unified_ideograph?(non_neg_integer()) :: boolean()

Check if a codepoint is a CJK Unified Ideograph.

Arguments

  • cp - an integer codepoint.

Returns

  • true if the codepoint is a CJK Unified Ideograph.

  • false otherwise.

Examples

iex> Localize.Collation.ImplicitWeights.unified_ideograph?(0x4E00)
true

iex> Localize.Collation.ImplicitWeights.unified_ideograph?(0x0041)
false