How Collation Options Affect Sort Order

Copy Markdown View Source

This document shows how each option to Cldr.Collation.compare/3 and Cldr.Collation.sort/2 affects the ordering of strings. All examples use the pure Elixir backend (backend: :elixir).

For a thorough introduction to the Unicode Collation Algorithm and locale-aware sorting, see:

Example word list

The following 10 words are used throughout this document unless noted otherwise:

cafe  café  Café  co-op  naive  naïve  résumé  Résumé  TR35  UTS10

This list includes mixed case (cafe / Café), accented characters (café, naïve, résumé), punctuation (co-op), and embedded digits (TR35, UTS10).

Default sort order

With no options (equivalent to strength: :tertiary, alternate: :non_ignorable), the collation algorithm considers base letters first (primary), then accents (secondary), then case (tertiary):

cafe | café | Café | co-op | naive | naïve | résumé | Résumé | TR35 | UTS10

Key observations:

  • cafe before café — the unaccented form sorts first (secondary difference).
  • café before Café — lowercase before uppercase (tertiary difference).
  • co-op sorts between café and naive — the hyphen has a primary weight.
  • TR35 before UTS10 — digits are compared character-by-character, so 3 < 1 never applies; T < U decides it at the primary level.

Summary of option effects

OptionControlsKey effect on example list
strength: :primaryComparison depthcafe = café = Café
strength: :secondaryComparison depthcafecafé, but café = Café
alternate: :shiftedPunctuation/space handlingco-op = coop, de luge = deluge
case_first: :upperCase orderingCafé before café, Résumé before résumé
case_level: trueCase at lower strengthsCase distinguished even at :primary strength
backwards: trueAccent comparison directioncôte before coté (French convention)
numeric: trueDigit handlingitem2 before item10
max_variableShifted boundaryControls which symbols are ignored under shifted
reorderScript ordering[:Grek] promotes Greek before Latin
localeLocale tailoringÄrger before Alter (German phonebook)

Strength

The strength option controls how many levels of difference are significant.

:primary — base letters only

Ignores both accents and case. Equivalent to :ignore_accents.

iex> Cldr.Collation.compare("cafe", "café", strength: :primary)
:eq

iex> Cldr.Collation.compare("cafe", "Café", strength: :primary)
:eq

At primary strength, cafe, café, and Café are all considered equal because they share the same base letters.

:secondary — base letters + accents

Ignores case but distinguishes accents. Equivalent to :ignore_case.

iex> Cldr.Collation.compare("cafe", "café", strength: :secondary)
:lt

iex> Cldr.Collation.compare("cafe", "Café", strength: :secondary)
:lt

Accent matters in both cases; case is ignored.

:tertiary — base letters + accents + case (default)

Distinguishes base letters, accents, and case:

iex> Cldr.Collation.compare("café", "Café", strength: :tertiary)
:lt

Lowercase sorts before uppercase.

:quaternary — adds punctuation distinction under shifted mode

Only meaningful when combined with alternate: :shifted. See the alternate section below.

:identical — full codepoint-level distinction

After all collation levels, compares the NFD-normalized codepoint sequences. Two strings are :eq at :identical strength only if they are codepoint-for-codepoint the same after normalization.


Alternate (variable weight handling)

The alternate option controls how "variable" characters — spaces, punctuation, and symbols — are treated.

:non_ignorable (default)

Variable characters have primary weights and affect sort order:

iex> Cldr.Collation.compare("co-op", "coop")
:lt

iex> Cldr.Collation.compare("de luge", "deluge")
:lt

The hyphen and space have primary weights and affect sort order.

:shifted

Variable characters are ignored at primary through tertiary levels. This makes co-op and coop compare as equal:

iex> Cldr.Collation.compare("co-op", "coop", alternate: :shifted)
:eq

iex> Cldr.Collation.compare("de luge", "deluge", alternate: :shifted)
:eq

At :quaternary strength, shifted variable characters become distinguishable again — the original primary weight moves to a fourth comparison level:

iex> Cldr.Collation.compare("co-op", "coop",
...>   alternate: :shifted, strength: :quaternary)
:lt

The convenience option :ignore_punctuation sets alternate: :shifted.


Max variable

The max_variable option controls which characters are considered "variable" when alternate: :shifted is active:

ValueVariable characters include
:spaceWhitespace only
:punctWhitespace + punctuation (default)
:symbolWhitespace + punctuation + symbols
:currencyAll of the above + currency signs

For example, with max_variable: :currency, currency symbols like $ and are ignored at the primary level under shifted mode.


Case first

The case_first option changes whether uppercase or lowercase sorts first among otherwise-equal strings.

Default (no case_first)

Lowercase sorts before uppercase:

iex> Cldr.Collation.compare("café", "Café")
:lt

Sort order: café then Café.

case_first: :upper

Uppercase sorts before lowercase:

iex> Cldr.Collation.compare("café", "Café", case_first: :upper)
:gt

Sort order with the full word list:

cafe | Café | café | co-op | naive | naïve | Résumé | résumé | TR35 | UTS10

Notice Café now precedes café, and Résumé precedes résumé.

case_first: :lower

Explicitly marks lowercase-first (same as the default behavior):

cafe | café | Café | co-op | naive | naïve | résumé | Résumé | TR35 | UTS10

Some locales set case_first: :upper by default. Danish (da), Norwegian Bokmål (nb), and Norwegian Nynorsk (nn) all sort uppercase before lowercase as their locale default.


Case level

The case_level option inserts an extra comparison level between secondary (accents) and tertiary (case), allowing case to be distinguished even at primary or secondary strength:

iex> Cldr.Collation.compare("cafe", "Cafe", strength: :primary)
:eq

iex> Cldr.Collation.compare("cafe", "Cafe", strength: :primary, case_level: true)
:lt

Without case_level, primary strength ignores case entirely. With case_level: true, case differences are checked even though accent differences are not.


Backwards (French accent sorting)

The backwards option reverses the comparison direction for secondary weights (accents). This implements the French sorting convention where the last accent difference takes priority.

# Default: leftmost accent difference wins
# ô (on 2nd char) vs. é (on 4th char) — ô is encountered first, decides it
iex> Cldr.Collation.compare("côte", "coté")
:gt

# Backwards: rightmost accent difference wins
# é (on 4th char) is the last difference — coté has the later accent
iex> Cldr.Collation.compare("côte", "coté", backwards: true)
:lt

The full French sorting example with cote, coté, côte, côté:

OrderDefaultbackwards: true
1cotecote
2cotécôte
3côtecoté
4côtécôté

Notice positions 2 and 3 are swapped — with backwards: true, the accent on the final vowel takes priority over the accent on the first vowel.


Numeric

The numeric option treats sequences of digits as numeric values rather than comparing them character-by-character:

# Default: "10" < "2" because "1" < "2" character-by-character
iex> Cldr.Collation.sort(["item2", "item10", "item1"])
["item1", "item10", "item2"]

# Numeric: "2" < "10" because 2 < 10 as numbers
iex> Cldr.Collation.sort(["item2", "item10", "item1"], numeric: true)
["item1", "item2", "item10"]

This is especially useful for version numbers and numbered labels:

# Character-by-character: "10" sorts before "2" and "9"
iex> Cldr.Collation.sort(["v1.9", "v1.10", "v1.2"])
["v1.10", "v1.2", "v1.9"]

# Numeric comparison: 2 < 9 < 10
iex> Cldr.Collation.sort(["v1.9", "v1.10", "v1.2"], numeric: true)
["v1.2", "v1.9", "v1.10"]

Reorder

The reorder option changes the relative ordering of scripts. By default, the Unicode Collation Algorithm sorts Latin before Greek before Cyrillic, etc. The reorder option promotes specified scripts to sort first.

iex> words = ["alpha", "αλφα", "бета", "100"]

# Default: digits, Latin, Greek, Cyrillic
iex> Cldr.Collation.sort(words)
["100", "alpha", "αλφα", "бета"]

# Greek promoted before Latin
iex> Cldr.Collation.sort(words, reorder: [:Grek])
["100", "αλφα", "alpha", "бета"]

# Cyrillic promoted before Latin
iex> Cldr.Collation.sort(words, reorder: [:Cyrl])
["100", "бета", "alpha", "αλφα"]

# Greek first, then Cyrillic, then Latin
iex> Cldr.Collation.sort(words, reorder: [:Grek, :Cyrl])
["100", "αλφα", "бета", "alpha"]

# Cyrillic first, then Greek, then Latin
iex> Cldr.Collation.sort(words, reorder: [:Cyrl, :Grek])
["100", "бета", "αλφα", "alpha"]

The listed scripts are promoted in the order given. Unlisted scripts retain their relative order after the promoted ones. Digits and punctuation always sort first regardless of reorder settings.

This is useful for applications like a Greek-language phone directory where Greek names should appear before transliterated Latin names, or a Russian application where Cyrillic entries should sort before Latin transliterations.


Locale-specific tailoring

The locale option applies locale-specific sorting rules that change the ordering of certain characters. These differences can be dramatic.

German: dictionary vs. phonebook

German has two collation types. The default (dictionary) sorts umlauted vowels with their base letter. The phonebook type expands them — treating Ä as AE, Ö as OE, Ü as UE:

iex> words = ["Ärger", "Alter", "Ofen", "Öl", "Über", "Ulm"]

iex> Cldr.Collation.sort(words)
["Alter", "Ärger", "Ofen", "Öl", "Über", "Ulm"]

iex> Cldr.Collation.sort(words, locale: "de-u-co-phonebk")
["Ärger", "Alter", "Öl", "Ofen", "Über", "Ulm"]

In the default sort, Alter comes before Ärger (A before Ä is a secondary difference). In the phonebook sort, Ärger sorts as if spelled Aerger, placing it before Alter.

The practical impact is most visible with surnames — exactly the scenario German phone directories are designed for:

iex> names = ["Müller", "Mueller", "Muller", "Mütze", "Much"]

iex> Cldr.Collation.sort(names, locale: "de")
["Much", "Mueller", "Muller", "Müller", "Mütze"]

iex> Cldr.Collation.sort(names, locale: "de-u-co-phonebk")
["Much", "Mueller", "Müller", "Mütze", "Muller"]

In dictionary order, Müller sorts after Muller — the umlaut is a secondary variant of u. In phonebook order, Müller is treated as Mueller, so it groups next to Mueller and before Muller. This is the behavior Germans expect when looking up a name in a phone book: someone named Müller should appear near Mueller, not separated from it by Muller.

Swedish: å ä ö sort at the end

In Swedish, the letters å, ä, ö are independent letters that sort after z — not as accented variants of a and o:

iex> words = ["ånger", "ärlig", "zero", "öra"]

# Default: accented characters sort as base letter variants
iex> Cldr.Collation.sort(words)
["ånger", "ärlig", "öra", "zero"]

# Swedish: å ä ö sort after z
iex> Cldr.Collation.sort(words, locale: "sv")
["zero", "ånger", "ärlig", "öra"]

Danish: æ ø å sort at the end

Similar to Swedish, Danish treats æ, ø, å as separate letters after z:

iex> words = ["ånger", "ærlig", "zero", "ål", "øre"]

iex> Cldr.Collation.sort(words)
["ærlig", "ål", "ånger", "øre", "zero"]

iex> Cldr.Collation.sort(words, locale: "da")
["zero", "ærlig", "øre", "ål", "ånger"]

Spanish: traditional sort with ch and ll

Traditional Spanish sorting treats ch and ll as single letters that sort after c and l respectively:

iex> words = ["Chile", "chocolate", "llamar", "luna"]

# Default: character-by-character
iex> Cldr.Collation.sort(words)
["Chile", "chocolate", "llamar", "luna"]

# Traditional: ll sorts after l
iex> Cldr.Collation.sort(words, locale: "es-u-co-trad")
["Chile", "chocolate", "luna", "llamar"]

In the traditional sort, luna precedes llamar because ll is treated as a letter that comes after l.


BCP47 locale strings

All options can be specified via BCP47 -u- extension keys in a locale string, making it possible to express collation preferences in a single string:

BCP47 keyOptionExample
-ks-strengthen-u-ks-level2 (secondary)
-ka-alternateen-u-ka-shifted
-kb-backwardsfr-u-kb-true
-kc-case_levelen-u-kc-true
-kf-case_firsten-u-kf-upper
-kn-numericen-u-kn-true
-kr-reorderen-u-kr-grek
-kv-max_variableen-u-kv-space
-co-collation typede-u-co-phonebk

For example:

iex> words = ["cafe", "café", "Café", "co-op", "naive", "naïve", "résumé", "Résumé", "TR35", "UTS10"]
iex> Cldr.Collation.sort(words, locale: "en-u-ks-level2-ka-shifted-kn-true")
["cafe", "café", "Café", "co-op", "naive", "naïve", "résumé", "Résumé", "TR35", "UTS10"]

This sorts at secondary strength, with shifted punctuation handling, and numeric digit comparison — all expressed in a single locale string.