Cldr.Collation.Han (Cldr Collation v1.0.0)

Copy Markdown View Source

Han character ordering using radical-stroke indexes.

Implements the sorting algorithm from UAX #38, computing 64-bit collation keys based on:

  • Radical number (1-214, Kangxi radicals)
  • Residual stroke count
  • Simplified radical indicator
  • Unicode block
  • Code point value

The radical data is parsed from FractionalUCA.txt [radical N=...] entries.

Summary

Functions

Get the CJK block index for a codepoint.

Returns a specification to start this module under a supervisor.

Compute collation elements for a Han character using radical-stroke ordering.

Ensure the Han radical data is loaded into ETS.

Convert a 64-bit radical-stroke key to two collation elements.

Parse a radical definition line from FractionalUCA.txt.

Functions

block_index(cp)

@spec block_index(non_neg_integer()) :: non_neg_integer()

Get the CJK block index for a codepoint.

Maps a codepoint to its CJK Unified Ideograph block for use in the radical-stroke sort key.

Arguments

  • cp - an integer codepoint.

Returns

An integer block index:

  • 0 - CJK Unified Ideographs (U+4E00..U+9FFF).
  • 1 - Extension A (U+3400..U+4DBF).
  • 2 - Extension B (U+20000..U+2A6DF).
  • 3..8 - Extensions C through H.
  • 254 - CJK Compatibility Ideographs (U+F900..U+FAFF).

Examples

iex> Cldr.Collation.Han.block_index(0x4E00)
0

iex> Cldr.Collation.Han.block_index(0x3400)
1

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

collation_elements(codepoint)

@spec collation_elements(non_neg_integer()) :: [Cldr.Collation.Element.t()] | nil

Compute collation elements for a Han character using radical-stroke ordering.

Arguments

  • codepoint - an integer codepoint for a CJK Unified Ideograph.

Returns

  • [%Cldr.Collation.Element{}, %Cldr.Collation.Element{}] - two CEs encoding the radical-stroke key.
  • nil - if the character has no radical data (falls back to implicit weights).

Examples

iex> Cldr.Collation.Han.ensure_loaded()
iex> elements = Cldr.Collation.Han.collation_elements(0x4E00)
iex> length(elements)
2

compute_key(radical, residual_strokes, simplification, block, codepoint)

Compute the 64-bit sorting key per UAX #38.

Bit layout:

  • bits 52-63: unused (0).

  • bits 44-51: radical number (1-214).

  • bits 36-43: residual strokes.

  • bits 32-35: reserved (0).

  • bits 28-31: simplification level.

  • bits 20-27: block index.

  • bits 0-19: code point.

Arguments

  • radical - the Kangxi radical number (1-214).
  • residual_strokes - the residual stroke count after removing the radical.
  • simplification - the simplification level (0 for traditional).
  • block - the CJK block index (see block_index/1).
  • codepoint - the Unicode codepoint.

Returns

A 64-bit integer encoding all components of the radical-stroke sort key.

Examples

iex> Cldr.Collation.Han.compute_key(1, 0, 0, 0, 0x4E00)
17592186064384

ensure_loaded()

@spec ensure_loaded() :: :ok

Ensure the Han radical data is loaded into ETS.

Loads radical-stroke data from FractionalUCA.txt on first call. Subsequent calls are no-ops.

Returns

  • :ok - the radical data is loaded and ready.

Examples

iex> Cldr.Collation.Han.ensure_loaded()
:ok

key_to_elements(key)

@spec key_to_elements(non_neg_integer()) :: [Cldr.Collation.Element.t()]

Convert a 64-bit radical-stroke key to two collation elements.

Encodes the key as two CEs using the Han implicit base (0xFB40):

  • CE1: primary = 0xFB40 + (key >> 32), secondary = 0x0020, tertiary = 0x0002
  • CE2: primary = (key & 0xFFFF) | 0x8000, secondary = 0x0000, tertiary = 0x0000

Arguments

Returns

A list of two %Cldr.Collation.Element{} structs.

Examples

iex> elements = Cldr.Collation.Han.key_to_elements(0)
iex> Cldr.Collation.Element.primary(hd(elements))
0xFB40

parse_radical_line(line)

@spec parse_radical_line(String.t()) ::
  {:ok, pos_integer(),
   [{non_neg_integer(), non_neg_integer(), non_neg_integer()}]}
  | :skip

Parse a radical definition line from FractionalUCA.txt.

Arguments

  • line - a trimmed line from FractionalUCA.txt in the format [radical N=CANONICAL:MEMBER_LIST].

Returns

  • {:ok, radical_num, members} - the radical number and a list of {codepoint, simplification, strokes} tuples.
  • :skip - the line is not a radical definition.

Examples

iex> Cldr.Collation.Han.parse_radical_line("not a radical line")
:skip

start_link(options \\ [])