# `Localize.Collation.Table.Parser`
[🔗](https://github.com/elixir-localize/localize/blob/v0.38.0/lib/localize/collation/table/parser.ex#L1)

Parses the FractionalUCA.txt file into a map of codepoint sequences to collation elements.

FractionalUCA.txt is the single source of truth for the collation table. Each data line
contains both fractional weights (used for script reordering) and allkeys-format decimal
weights (used for collation element construction) in the comment:

* Single codepoint: `0041; [2B, 05, 9C]  # Latn Lu  [23EC.0020.0008]  * LATIN CAPITAL LETTER A`.

* Multi-CE: `00E9; [2B 86, 05, 05]  # Latn Ll  [2453.0020.0002][0000.0024.0002]  * LATIN SMALL LETTER E WITH ACUTE`.

* Context entry: `004C | 00B7; [, FB B6, 05]  # Zyyy Po  [0000.011F.0002]  * MIDDLE DOT`.

Context entries represent CLDR-specific contractions where a target codepoint's weights
change depending on the preceding context codepoint. These are converted to explicit
contraction entries (e.g., `{0x004C, 0x00B7} => L's CEs ++ modified CEs`).

Variable status (spaces, punctuation, symbols, currency) is derived from the
`[last variable]` header line rather than per-entry markers.

# `codepoints_to_key`

Convert a codepoint list to a table key.

Single codepoints become bare integers, multi-codepoint sequences (contractions)
become tuples for compact persistent_term storage.

### Arguments

* `codepoints` - a list of integer codepoints.

### Returns

An integer for single codepoints, or a tuple for contractions.

### Examples

    iex> Localize.Collation.Table.Parser.codepoints_to_key([0x0041])
    0x0041

    iex> Localize.Collation.Table.Parser.codepoints_to_key([0x006C, 0x00B7])
    {0x006C, 0x00B7}

# `parse`

Parse FractionalUCA.txt into a collation table.

This is the primary parser that builds the complete collation table from
a single data file. Variable status is derived from the `[last variable]`
header line.

### Arguments

* `path` - file path to the FractionalUCA.txt data file.

### Returns

A map with two keys:

* `:entries` - `%{integer() | tuple() => [Element.t()]}` mapping codepoints
  (integers for single, tuples for contractions) to collation elements.

* `:version` - the UCA version string from the file header, or `nil`.

# `parse_elements`

Parse weight elements from an allkeys weight string.

### Arguments

* `str` - the weight portion of an allkeys line (e.g., `"[.23EC.0020.0008]"`).

### Returns

A list of collation element tuples `{primary, secondary, tertiary, variable}`.

### Examples

    iex> Localize.Collation.Table.Parser.parse_elements("[.23EC.0020.0008]")
    [{0x23EC, 0x0020, 0x0008, false}]

    iex> Localize.Collation.Table.Parser.parse_elements("[*0269.0020.0002]")
    [{0x0269, 0x0020, 0x0002, true}]

# `parse_fractional_entry`

Parse a single FractionalUCA.txt data entry.

### Arguments

* `line` - a single data line from FractionalUCA.txt.

### Returns

* `{:ok, codepoints, elements}` - the parsed codepoint list and collation elements.

* `{:context, context_cp, target_cp, elements}` - a context entry to be resolved later.

* `:skip` - the line could not be parsed.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
