Localize.Collation (Localize v0.5.0)

Copy Markdown View Source

Implements the Unicode Collation Algorithm (UCA) as extended by CLDR.

Collation is the general term for the process and function of determining the sorting order of strings of characters, for example for lists of strings presented to users, or in databases for sorting and selecting records.

Collation varies by language, by application (some languages use special phonebook sorting), and other criteria (for example, phonetic vs. visual).

CLDR provides collation data for many languages and styles. The data supports not only sorting but also language-sensitive searching and grouping under index headers. All CLDR collations are based on the [UCA] default order, with common modifications applied in the CLDR root collation, and further tailored for language and style as needed.

Basic Usage

# Compare two strings
iex> Localize.Collation.compare("café", "cafe")
:gt

# Sort a list of strings
iex> Localize.Collation.sort(["café", "cafe", "Cafe"])
["cafe", "Cafe", "café"]

# Generate a sort key
iex> key = Localize.Collation.sort_key("hello")
iex> is_binary(key)
true

# With options
iex> Localize.Collation.compare("a", "A", strength: :secondary)
:eq

# From BCP47 locale (ks-level2 = secondary strength, ignores case)
iex> Localize.Collation.compare("a", "A", locale: "en-u-ks-level2")
:eq

Collation Options

All BCP47 -u- extension collation keys are supported:

  • strength - :primary, :secondary, :tertiary (default), :quaternary, :identical.

  • alternate - :non_ignorable (default), :shifted.

  • backwards - false (default), true - reverse secondary weights (French).

  • normalization - false (default), true - NFD normalize input.

  • case_level - false (default), true - insert case-only level.

  • case_first - false (default), :upper, :lower.

  • numeric - false (default), true - numeric string comparison.

  • reorder - [] (default), list of script code atoms.

  • max_variable - :punct (default), :space, :symbol, :currency.

  • ignore_accents - true to ignore accent differences (sets strength to primary).

  • ignore_case - true to ignore case differences (sets strength to secondary).

  • ignore_punctuation - true to ignore punctuation and whitespace (sets alternate to shifted).

  • casing - :sensitive, :insensitive (convenience alias).

  • backend - :nif or :elixir. The default is :elixir.

Summary

Functions

Compare two strings using the CLDR collation algorithm.

Ensure the collation tables are loaded into persistent term storage.

Sort a list of strings using the CLDR collation algorithm.

Generate a binary sort key for the given input.

Functions

compare(string_a, string_b, options \\ [])

@spec compare(String.t(), String.t(), keyword() | Localize.Collation.Options.t()) ::
  :lt | :eq | :gt

Compare two strings using the CLDR collation algorithm.

Arguments

  • string_a - the first string to compare.

  • string_b - the second string to compare.

  • options - a keyword list of collation options.

Options

  • :strength - comparison level: :primary, :secondary, :tertiary (default), :quaternary, or :identical.

  • :alternate - variable weight handling: :non_ignorable (default) or :shifted.

  • :backwards - reverse secondary weights for French sorting: false (default) or true.

  • :normalization - NFD normalize input: false (default) or true.

  • :case_level - insert case-only comparison level: false (default) or true.

  • :case_first - case ordering: false (default), :upper, or :lower.

  • :numeric - numeric string comparison: false (default) or true.

  • :reorder - list of script code atoms to reorder: [] (default).

  • :max_variable - variable weight boundary: :punct (default), :space, :symbol, or :currency.

  • :ignore_accents - true to ignore accent differences.

  • :ignore_case - true to ignore case differences.

  • :ignore_punctuation - true to ignore punctuation and whitespace.

  • :casing - :sensitive or :insensitive.

  • :locale - a BCP47 locale string or a Localize.LanguageTag struct.

  • :backend - :nif or :elixir. The default is :elixir.

Returns

  • :lt - if string_a sorts before string_b.

  • :eq - if string_a and string_b are equal at the given strength.

  • :gt - if string_a sorts after string_b.

Examples

iex> Localize.Collation.compare("cafe", "café")
:lt

iex> Localize.Collation.compare("a", "A", strength: :secondary)
:eq

iex> Localize.Collation.compare("a", "A", casing: :insensitive)
:eq

ensure_loaded()

@spec ensure_loaded() :: :ok

Ensure the collation tables are loaded into persistent term storage.

Returns

  • :ok - tables are loaded and ready.

Examples

iex> Localize.Collation.ensure_loaded()
:ok

sort(strings, options \\ [])

@spec sort([String.t()], keyword() | Localize.Collation.Options.t()) :: [String.t()]

Sort a list of strings using the CLDR collation algorithm.

Arguments

  • strings - a list of UTF-8 strings to sort.

  • options - a keyword list of collation options.

Returns

A new list of strings sorted according to the CLDR collation rules.

Examples

iex> Localize.Collation.sort(["café", "cafe", "Cafe"])
["cafe", "Cafe", "café"]

iex> Localize.Collation.sort(["б", "а", "в"])
["а", "б", "в"]

sort_key(input, options \\ [])

@spec sort_key(
  String.t() | [non_neg_integer()],
  keyword() | Localize.Collation.Options.t()
) :: binary()

Generate a binary sort key for the given input.

Sort keys can be compared directly with <, >, == for ordering. This is efficient when the same strings need to be compared multiple times.

Arguments

Returns

A binary sort key that can be compared with standard binary comparison operators.

Examples

iex> key_a = Localize.Collation.sort_key("cafe")
iex> key_b = Localize.Collation.sort_key("café")
iex> key_a < key_b
true

iex> Localize.Collation.sort_key("hello") == Localize.Collation.sort_key("hello")
true