Implements the CLDR Transform specification for transforming text from one script to another.
Transforms are defined by the Unicode CLDR specification and support operations such as transliteration between scripts, normalization, and case mapping.
Usage Examples
iex> Unicode.Transform.transform("Ä Ö Ü ß", from: :latin, to: :ascii)
{:ok, "A O U ss"}
iex> Unicode.Transform.transform("hello", to: :upper)
{:ok, "HELLO"}
iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
{:ok, "AE oe ue"}Transform ID resolution
When transform/2 is called, the transform ID is resolved through
one of two paths depending on the options provided.
Direct ID (:transform option)
The string is used as-is. If the ID is not found as a built-in or
in the CLDR transform files, and it has the form "Any-Target",
unicode_transform falls back to automatic script detection (see below).
Script-based (:from / :to options)
The :from and :to values are normalized to canonical script
names (case-insensitive, supporting both Unicode names like :greek
and BCP47 codes like :grek). Resolution then proceeds as follows:
Built-in check — if the ID matches a built-in transform (e.g.,
Any-NFC,Any-Upper), it is dispatched directly to the correspondingStringfunction.Forward file lookup —
unicode_transformlooks for a CLDR XML file matching"From-To"(e.g.,"Greek-Latin"), checking the alias index built from file metadata.Reverse file lookup — if no forward match is found,
unicode_transformlooks for"To-From"and marks the direction as:reverse(e.g.,to: :greek, from: :latinresolves to"Greek-Latin"in reverse).BCP47 fallback — if neither exact nor case-insensitive matches succeed, the ID is resolved as a BCP47 transform ID (e.g.,
"Grek-Latn"→"Greek-Latin").
The Any source and script detection
When :from is :any (the default) or when a transform: "Any-X"
ID is used, unicode_transform first checks for a specific Any-X transform
(built-in or file-based, such as Any-Accents or Any-Publishing).
If no specific Any-X transform exists, unicode_transform falls back to
automatic script detection: it calls Unicode.script_dominance/1
to identify the scripts present in the input string, then chains
a {detected_script}-X transform for each detected script. Common,
inherited, and unknown scripts are skipped.
For example, transform("αβγδ абвг", from: :any, to: :latin) detects
Greek and Cyrillic, then applies Greek-Latin followed by
Cyrillic-Latin.
This is equivalent to using from: :detect, which always uses script
detection without checking for a specific Any-X transform first.
Sub-transform narrowing
CLDR transform files can reference sub-transforms via ::Name;
rules. When a sub-transform is a bare script name (e.g., ::Latin;
inside Greek-Latin.xml), it is narrowed using the parent
transform's source and target scripts — resolving ::Latin; to
Greek-Latin. Sub-transforms that are already compound names
(e.g., ::Bengali-InterIndic;) or built-ins (e.g., ::NFC;)
are used as-is.
Summary
Functions
Returns a list of available transform IDs.
Returns the default transform backend.
Transforms a string using the specified transform.
Transforms a string using the specified transform, raising on error.
Types
Functions
@spec available_transforms() :: [String.t()]
Returns a list of available transform IDs.
Returns
A list of transform ID strings.
@spec default_backend() :: :nif | :elixir
Returns the default transform backend.
Returns
:nifif the ICU NIF is loaded and available.:elixirotherwise.
@spec transform(String.t(), [transform_option()]) :: {:ok, String.t()} | {:error, term()}
Transforms a string using the specified transform.
There are two ways to specify which transform to apply:
Script-based — use
:fromand:toto specify source and target scripts as atoms. The transform ID and direction are inferred.Direct — use
:transformwith the string transform ID, and optionally:direction(default:forward).
See the Transform ID resolution section in the
module documentation for details on how transform IDs are resolved,
including Any- handling and automatic script detection.
Arguments
string— the input string to transform.
Options
Either :from/:to or :transform must be provided:
:to— the target script as an atom or string (e.g.,:latin,"ASCII",:upper,:nfc). Required unless:transformis given. Resolution is case-insensitive.:from— the source script as an atom or string (default::any). E.g.,:greek,"Cyrillic". Resolution is case-insensitive. Use:detectto automatically detect scripts in the input and chain a transform for each detected script.:transform— a string transform ID (e.g.,"de-ASCII","Armenian-Latin-BGN"). Mutually exclusive with:from/:to.:direction—:forward(default) or:reverse. Only used with:transform.:backend—:nifor:elixir. Selects the transform engine. When set to:nif, transforms are executed via ICU4C's native transliterator. When set to:elixir, the pure-Elixir CLDR-based engine is used. Defaults to:nifwhen the NIF is available, otherwise:elixir.
Returns
{:ok, transformed_string}on success.{:error, reason}on failure.
Examples
iex> Unicode.Transform.transform("Ä Ö Ü ß", from: :latin, to: :ascii)
{:ok, "A O U ss"}
iex> Unicode.Transform.transform("αβγδ", from: :greek, to: :latin)
{:ok, "abgd"}
iex> Unicode.Transform.transform("hello", to: :upper)
{:ok, "HELLO"}
iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
{:ok, "AE oe ue"}
@spec transform!(String.t(), [transform_option()]) :: String.t()
Transforms a string using the specified transform, raising on error.
Arguments
string— the input string to transform.
Options
Same as transform/2.
Returns
The transformed string.
Examples
iex> Unicode.Transform.transform!("Ä Ö Ü ß", from: :latin, to: :ascii)
"A O U ss"