# `Unicode.Transform`
[🔗](https://github.com/elixir-unicode/unicode_transform/blob/v1.0.0/lib/unicode/transform.ex#L1)

Implements the CLDR Transform specification for transforming
text from one script to another.

Transforms are defined by the [Unicode CLDR
specification](https://unicode.org/reports/tr35/tr35-general.html#Transforms)
and support operations such as transliteration between scripts,
normalization, and case mapping.

### Usage Examples

    iex> Unicode.Transform.transform("Ä Ö Ü ß", from: :latin, to: :ascii)
    {:ok, "A O U ss"}

    iex> Unicode.Transform.transform("hello", to: :upper)
    {:ok, "HELLO"}

    iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
    {:ok, "AE oe ue"}

### Transform ID resolution

When `transform/2` is called, the transform ID is resolved through
one of two paths depending on the options provided.

#### Direct ID (`:transform` option)

The string is used as-is. If the ID is not found as a built-in or
in the CLDR transform files, and it has the form `"Any-Target"`,
`unicode_transform` falls back to automatic script detection (see below).

#### Script-based (`:from` / `:to` options)

The `:from` and `:to` values are normalized to canonical script
names (case-insensitive, supporting both Unicode names like `:greek`
and BCP47 codes like `:grek`). Resolution then proceeds as follows:

1. **Built-in check** — if the ID matches a built-in transform
   (e.g., `Any-NFC`, `Any-Upper`), it is dispatched directly
   to the corresponding `String` function.

2. **Forward file lookup** — `unicode_transform` looks for a CLDR XML
   file matching `"From-To"` (e.g., `"Greek-Latin"`), checking
   the alias index built from file metadata.

3. **Reverse file lookup** — if no forward match is found,
   `unicode_transform` looks for `"To-From"` and marks the direction
   as `:reverse` (e.g., `to: :greek, from: :latin` resolves
   to `"Greek-Latin"` in reverse).

4. **BCP47 fallback** — if neither exact nor case-insensitive
   matches succeed, the ID is resolved as a BCP47 transform
   ID (e.g., `"Grek-Latn"` → `"Greek-Latin"`).

#### The `Any` source and script detection

When `:from` is `:any` (the default) or when a `transform: "Any-X"`
ID is used, `unicode_transform` first checks for a specific `Any-X` transform
(built-in or file-based, such as `Any-Accents` or `Any-Publishing`).

If no specific `Any-X` transform exists, `unicode_transform` falls back to
**automatic script detection**: it calls `Unicode.script_dominance/1`
to identify the scripts present in the input string, then chains
a `{detected_script}-X` transform for each detected script. Common,
inherited, and unknown scripts are skipped.

For example, `transform("αβγδ абвг", from: :any, to: :latin)` detects
Greek and Cyrillic, then applies `Greek-Latin` followed by
`Cyrillic-Latin`.

This is equivalent to using `from: :detect`, which always uses script
detection without checking for a specific `Any-X` transform first.

#### Sub-transform narrowing

CLDR transform files can reference sub-transforms via `::Name;`
rules. When a sub-transform is a bare script name (e.g., `::Latin;`
inside `Greek-Latin.xml`), it is narrowed using the parent
transform's source and target scripts — resolving `::Latin;` to
`Greek-Latin`. Sub-transforms that are already compound names
(e.g., `::Bengali-InterIndic;`) or built-ins (e.g., `::NFC;`)
are used as-is.

# `transform_option`

```elixir
@type transform_option() ::
  {:from, atom() | String.t()}
  | {:to, atom() | String.t()}
  | {:transform, String.t()}
  | {:direction, :forward | :reverse}
  | {:backend, :nif | :elixir}
```

# `available_transforms`

```elixir
@spec available_transforms() :: [String.t()]
```

Returns a list of available transform IDs.

### Returns

A list of transform ID strings.

# `default_backend`

```elixir
@spec default_backend() :: :nif | :elixir
```

Returns the default transform backend.

### Returns

* `:nif` if the ICU NIF is loaded and available.

* `:elixir` otherwise.

# `transform`

```elixir
@spec transform(String.t(), [transform_option()]) ::
  {:ok, String.t()} | {:error, term()}
```

Transforms a string using the specified transform.

There are two ways to specify which transform to apply:

1. **Script-based** — use `:from` and `:to` to specify source and target
   scripts as atoms. The transform ID and direction are inferred.

2. **Direct** — use `:transform` with the string transform ID, and
   optionally `:direction` (default `:forward`).

See the [Transform ID resolution](#module-transform-id-resolution) section in the
module documentation for details on how transform IDs are resolved,
including `Any-` handling and automatic script detection.

### Arguments

* `string` — the input string to transform.

### Options

Either `:from`/`:to` or `:transform` must be provided:

* `:to` — the target script as an atom or string (e.g., `:latin`,
  `"ASCII"`, `:upper`, `:nfc`). Required unless `:transform` is given.
  Resolution is case-insensitive.

* `:from` — the source script as an atom or string (default: `:any`).
  E.g., `:greek`, `"Cyrillic"`. Resolution is case-insensitive.
  Use `:detect` to automatically detect scripts in the input
  and chain a transform for each detected script.

* `:transform` — a string transform ID (e.g., `"de-ASCII"`,
  `"Armenian-Latin-BGN"`). Mutually exclusive with `:from`/`:to`.

* `:direction` — `:forward` (default) or `:reverse`. Only used
  with `:transform`.

* `:backend` — `:nif` or `:elixir`. Selects the transform engine.
  When set to `:nif`, transforms are executed via ICU4C's native
  transliterator. When set to `:elixir`, the pure-Elixir CLDR-based
  engine is used. Defaults to `:nif` when the NIF is available,
  otherwise `:elixir`.

### Returns

* `{:ok, transformed_string}` on success.

* `{:error, reason}` on failure.

### Examples

    iex> Unicode.Transform.transform("Ä Ö Ü ß", from: :latin, to: :ascii)
    {:ok, "A O U ss"}

    iex> Unicode.Transform.transform("αβγδ", from: :greek, to: :latin)
    {:ok, "abgd"}

    iex> Unicode.Transform.transform("hello", to: :upper)
    {:ok, "HELLO"}

    iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
    {:ok, "AE oe ue"}

# `transform!`

```elixir
@spec transform!(String.t(), [transform_option()]) :: String.t()
```

Transforms a string using the specified transform, raising on error.

### Arguments

* `string` — the input string to transform.

### Options

Same as `transform/2`.

### Returns

The transformed string.

### Examples

    iex> Unicode.Transform.transform!("Ä Ö Ü ß", from: :latin, to: :ascii)
    "A O U ss"

---

*Consult [api-reference.md](api-reference.md) for complete listing*
