Parses the FractionalUCA.txt file into a map of codepoint sequences to collation elements.
FractionalUCA.txt is the single source of truth for the collation table. Each data line contains both fractional weights (used for script reordering) and allkeys-format decimal weights (used for collation element construction) in the comment:
Single codepoint:
0041; [2B, 05, 9C] # Latn Lu [23EC.0020.0008] * LATIN CAPITAL LETTER A.Multi-CE:
00E9; [2B 86, 05, 05] # Latn Ll [2453.0020.0002][0000.0024.0002] * LATIN SMALL LETTER E WITH ACUTE.Context entry:
004C | 00B7; [, FB B6, 05] # Zyyy Po [0000.011F.0002] * MIDDLE DOT.
Context entries represent CLDR-specific contractions where a target codepoint's weights
change depending on the preceding context codepoint. These are converted to explicit
contraction entries (e.g., {0x004C, 0x00B7} => L's CEs ++ modified CEs).
Variable status (spaces, punctuation, symbols, currency) is derived from the
[last variable] header line rather than per-entry markers.
Summary
Functions
Convert a codepoint list to a table key.
Parse FractionalUCA.txt into a collation table.
Parse weight elements from an allkeys weight string.
Parse a single FractionalUCA.txt data entry.
Functions
Convert a codepoint list to a table key.
Single codepoints become bare integers, multi-codepoint sequences (contractions) become tuples for compact persistent_term storage.
Arguments
codepoints- a list of integer codepoints.
Returns
An integer for single codepoints, or a tuple for contractions.
Examples
iex> Localize.Collation.Table.Parser.codepoints_to_key([0x0041])
0x0041
iex> Localize.Collation.Table.Parser.codepoints_to_key([0x006C, 0x00B7])
{0x006C, 0x00B7}
Parse FractionalUCA.txt into a collation table.
This is the primary parser that builds the complete collation table from
a single data file. Variable status is derived from the [last variable]
header line.
Arguments
path- file path to the FractionalUCA.txt data file.
Returns
A map with two keys:
:entries-%{integer() | tuple() => [Element.t()]}mapping codepoints (integers for single, tuples for contractions) to collation elements.:version- the UCA version string from the file header, ornil.
Parse weight elements from an allkeys weight string.
Arguments
str- the weight portion of an allkeys line (e.g.,"[.23EC.0020.0008]").
Returns
A list of collation element tuples {primary, secondary, tertiary, variable}.
Examples
iex> Localize.Collation.Table.Parser.parse_elements("[.23EC.0020.0008]")
[{0x23EC, 0x0020, 0x0008, false}]
iex> Localize.Collation.Table.Parser.parse_elements("[*0269.0020.0002]")
[{0x0269, 0x0020, 0x0002, true}]
Parse a single FractionalUCA.txt data entry.
Arguments
line- a single data line from FractionalUCA.txt.
Returns
{:ok, codepoints, elements}- the parsed codepoint list and collation elements.{:context, context_cp, target_cp, elements}- a context entry to be resolved later.:skip- the line could not be parsed.