Han character ordering using radical-stroke indexes.
Implements the sorting algorithm from UAX #38, computing 64-bit collation keys based on:
- Radical number (1-214, Kangxi radicals)
- Residual stroke count
- Simplified radical indicator
- Unicode block
- Code point value
The radical data is parsed from FractionalUCA.txt [radical N=...] entries.
Summary
Functions
Get the CJK block index for a codepoint.
Returns a specification to start this module under a supervisor.
Compute collation elements for a Han character using radical-stroke ordering.
Compute the 64-bit sorting key per UAX #38.
Ensure the Han radical data is loaded into ETS.
Convert a 64-bit radical-stroke key to two collation elements.
Parse a radical definition line from FractionalUCA.txt.
Functions
@spec block_index(non_neg_integer()) :: non_neg_integer()
Get the CJK block index for a codepoint.
Maps a codepoint to its CJK Unified Ideograph block for use in the radical-stroke sort key.
Arguments
cp- an integer codepoint.
Returns
An integer block index:
0- CJK Unified Ideographs (U+4E00..U+9FFF).1- Extension A (U+3400..U+4DBF).2- Extension B (U+20000..U+2A6DF).3..8- Extensions C through H.254- CJK Compatibility Ideographs (U+F900..U+FAFF).
Examples
iex> Cldr.Collation.Han.block_index(0x4E00)
0
iex> Cldr.Collation.Han.block_index(0x3400)
1
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec collation_elements(non_neg_integer()) :: [Cldr.Collation.Element.t()] | nil
Compute collation elements for a Han character using radical-stroke ordering.
Arguments
codepoint- an integer codepoint for a CJK Unified Ideograph.
Returns
[%Cldr.Collation.Element{}, %Cldr.Collation.Element{}]- two CEs encoding the radical-stroke key.nil- if the character has no radical data (falls back to implicit weights).
Examples
iex> Cldr.Collation.Han.ensure_loaded()
iex> elements = Cldr.Collation.Han.collation_elements(0x4E00)
iex> length(elements)
2
@spec compute_key( non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer() ) :: non_neg_integer()
Compute the 64-bit sorting key per UAX #38.
Bit layout:
bits 52-63: unused (0).
bits 44-51: radical number (1-214).
bits 36-43: residual strokes.
bits 32-35: reserved (0).
bits 28-31: simplification level.
bits 20-27: block index.
bits 0-19: code point.
Arguments
radical- the Kangxi radical number (1-214).residual_strokes- the residual stroke count after removing the radical.simplification- the simplification level (0 for traditional).block- the CJK block index (seeblock_index/1).codepoint- the Unicode codepoint.
Returns
A 64-bit integer encoding all components of the radical-stroke sort key.
Examples
iex> Cldr.Collation.Han.compute_key(1, 0, 0, 0, 0x4E00)
17592186064384
@spec ensure_loaded() :: :ok
Ensure the Han radical data is loaded into ETS.
Loads radical-stroke data from FractionalUCA.txt on first call. Subsequent calls are no-ops.
Returns
:ok- the radical data is loaded and ready.
Examples
iex> Cldr.Collation.Han.ensure_loaded()
:ok
@spec key_to_elements(non_neg_integer()) :: [Cldr.Collation.Element.t()]
Convert a 64-bit radical-stroke key to two collation elements.
Encodes the key as two CEs using the Han implicit base (0xFB40):
- CE1: primary =
0xFB40 + (key >> 32), secondary =0x0020, tertiary =0x0002 CE2: primary =
(key & 0xFFFF) | 0x8000, secondary =0x0000, tertiary =0x0000
Arguments
key- a 64-bit integer radical-stroke key fromcompute_key/5.
Returns
A list of two %Cldr.Collation.Element{} structs.
Examples
iex> elements = Cldr.Collation.Han.key_to_elements(0)
iex> Cldr.Collation.Element.primary(hd(elements))
0xFB40
@spec parse_radical_line(String.t()) :: {:ok, pos_integer(), [{non_neg_integer(), non_neg_integer(), non_neg_integer()}]} | :skip
Parse a radical definition line from FractionalUCA.txt.
Arguments
line- a trimmed line from FractionalUCA.txt in the format[radical N=CANONICAL:MEMBER_LIST].
Returns
{:ok, radical_num, members}- the radical number and a list of{codepoint, simplification, strokes}tuples.:skip- the line is not a radical definition.
Examples
iex> Cldr.Collation.Han.parse_radical_line("not a radical line")
:skip