Unicode.IDNA (Unicode IDNA v0.1.0)

View Source

UTS #46 Unicode IDNA Compatibility Processing.

This module implements the algorithms in UTS #46 §4 (Processing), §4.1 (Validity Criteria) and §4.2 (ToASCII), with Punycode (RFC 3492), the CONTEXTJ rules of RFC 5892 Appendix A, and the RFC 5893 bidi rule.

The two main entry points, to_ascii/2 and to_unicode/2, accept either:

  • a String.t/0 containing a full domain name (one or more labels separated by an IDNA label separator: ., U+3002, U+FF0E, or U+FF61), or

  • a list of String.t/0 labels.

The return value mirrors the input shape: a string in returns a string out (the labels are rejoined with .); a list in returns a list out.

The default options track non-transitional UTS #46 processing with all checks enabled, which matches modern browsers (Chrome, Firefox, Safari).

Punycode encoding and decoding are also exposed directly as Unicode.IDNA.Punycode.encode/1 and Unicode.IDNA.Punycode.decode/1 for callers needing the raw RFC 3492 primitives without the surrounding UTS #46 processing.

Summary

Types

An error reason returned by to_ascii/2 or to_unicode/2.

Options controlling UTS #46 processing.

Functions

Applies UTS #46 ToASCII to a domain name.

Applies UTS #46 ToUnicode to a domain name.

Returns true if label is a valid IDNA label under the given options, false otherwise.

Types

error()

@type error() ::
  :empty_label
  | :disallowed
  | :hyphen_violation
  | :leading_combining_mark
  | :context
  | :bidi
  | :punycode_overflow
  | :punycode_invalid
  | :label_too_long
  | :domain_too_long

An error reason returned by to_ascii/2 or to_unicode/2.

options()

@type options() :: [
  transitional: boolean(),
  check_hyphens: boolean(),
  check_bidi: boolean(),
  check_joiners: boolean(),
  use_std3_ascii_rules: boolean(),
  verify_dns_length: boolean()
]

Options controlling UTS #46 processing.

  • :transitional — default false. When true, deviation code points are mapped to their replacements (the original IDNA 2003 behaviour).

  • :check_hyphens — default true. When true, a U-label may not begin or end with -, nor have - in both the third and fourth positions. The check is suppressed for ACE labels after Punycode decoding.

  • :check_bidi — default true. When true, if the domain contains a right-to-left character, every label must satisfy the RFC 5893 bidi rule.

  • :check_joiners — default true. When true, labels containing ZWJ or ZWNJ must satisfy the CONTEXTJ rules of RFC 5892.

  • :use_std3_ascii_rules — default true. When true, ASCII characters in a label are restricted to letters, digits and hyphen.

  • :verify_dns_length — default true. When true, each label must be 1–63 octets and the full domain (less the trailing ., if any) must be 1–253 octets.

Functions

to_ascii(domain, options \\ [])

@spec to_ascii(String.t() | [String.t()], options()) ::
  {:ok, String.t() | [String.t()]} | {:error, error()}

Applies UTS #46 ToASCII to a domain name.

Arguments

  • domain is either a String.t/0 containing one or more labels separated by an IDNA label separator (., U+3002, U+FF0E, U+FF61), or a list of String.t/0 labels. Each label may be in Unicode form or in ACE (xn--…) form.

  • options is a keyword list of UTS #46 options. See the type options/0.

Returns

  • {:ok, ascii_domain} on success. The shape mirrors the input: a string in returns a string (labels rejoined with .); a list in returns the list of ASCII labels.

  • {:error, reason} — see the error/0 type.

Examples

iex> Unicode.IDNA.to_ascii("bücher.de")
{:ok, "xn--bcher-kva.de"}

iex> Unicode.IDNA.to_ascii("中文。中国")
{:ok, "xn--fiq228c.xn--fiqs8s"}

iex> Unicode.IDNA.to_ascii(["bücher", "de"])
{:ok, ["xn--bcher-kva", "de"]}

iex> Unicode.IDNA.to_ascii("ASCII")
{:ok, "ascii"}

iex> Unicode.IDNA.to_ascii("xn--bcher-kva")
{:ok, "xn--bcher-kva"}

iex> Unicode.IDNA.to_ascii("not_valid")
{:error, :disallowed}

iex> Unicode.IDNA.to_ascii("not_valid", use_std3_ascii_rules: false)
{:ok, "not_valid"}

to_unicode(domain, options \\ [])

@spec to_unicode(String.t() | [String.t()], options()) ::
  {:ok, String.t() | [String.t()]} | {:error, error()}

Applies UTS #46 ToUnicode to a domain name.

Arguments

Returns

  • {:ok, unicode_domain} on success — string in / string out, list in / list out.

  • {:error, reason} on failure.

Examples

iex> Unicode.IDNA.to_unicode("xn--bcher-kva.de")
{:ok, "bücher.de"}

iex> Unicode.IDNA.to_unicode("BÜCHER.DE")
{:ok, "bücher.de"}

iex> Unicode.IDNA.to_unicode(["xn--bcher-kva", "de"])
{:ok, ["bücher", "de"]}

iex> Unicode.IDNA.to_unicode("xn--bcher-kva")
{:ok, "bücher"}

iex> Unicode.IDNA.to_unicode("bücher")
{:ok, "bücher"}

valid_label?(label, options \\ [])

@spec valid_label?(String.t(), options()) :: boolean()

Returns true if label is a valid IDNA label under the given options, false otherwise.

Operates on a single label only. Equivalent to match?({:ok, _}, to_ascii(label, options)) for a binary input that does not contain a label separator.

Arguments

Returns

  • true or false.

Examples

iex> Unicode.IDNA.valid_label?("bücher")
true

iex> Unicode.IDNA.valid_label?("not_valid")
false

iex> Unicode.IDNA.valid_label?("not_valid", use_std3_ascii_rules: false)
true