Unicode.IDNA
(Unicode IDNA v0.1.0)
View Source
UTS #46 Unicode IDNA Compatibility Processing.
This module implements the algorithms in UTS #46 §4 (Processing), §4.1 (Validity Criteria) and §4.2 (ToASCII), with Punycode (RFC 3492), the CONTEXTJ rules of RFC 5892 Appendix A, and the RFC 5893 bidi rule.
The two main entry points, to_ascii/2 and to_unicode/2, accept either:
a
String.t/0containing a full domain name (one or more labels separated by an IDNA label separator:.,U+3002,U+FF0E, orU+FF61), ora list of
String.t/0labels.
The return value mirrors the input shape: a string in returns a string out (the labels are rejoined with .); a list in returns a list out.
The default options track non-transitional UTS #46 processing with all checks enabled, which matches modern browsers (Chrome, Firefox, Safari).
Punycode encoding and decoding are also exposed directly as Unicode.IDNA.Punycode.encode/1 and Unicode.IDNA.Punycode.decode/1 for callers needing the raw RFC 3492 primitives without the surrounding UTS #46 processing.
Summary
Types
An error reason returned by to_ascii/2 or to_unicode/2.
Options controlling UTS #46 processing.
Functions
Applies UTS #46 ToASCII to a domain name.
Applies UTS #46 ToUnicode to a domain name.
Returns true if label is a valid IDNA label under the given options, false otherwise.
Types
@type error() ::
:empty_label
| :disallowed
| :hyphen_violation
| :leading_combining_mark
| :context
| :bidi
| :punycode_overflow
| :punycode_invalid
| :label_too_long
| :domain_too_long
An error reason returned by to_ascii/2 or to_unicode/2.
@type options() :: [ transitional: boolean(), check_hyphens: boolean(), check_bidi: boolean(), check_joiners: boolean(), use_std3_ascii_rules: boolean(), verify_dns_length: boolean() ]
Options controlling UTS #46 processing.
:transitional— defaultfalse. Whentrue, deviation code points are mapped to their replacements (the original IDNA 2003 behaviour).:check_hyphens— defaulttrue. Whentrue, a U-label may not begin or end with-, nor have-in both the third and fourth positions. The check is suppressed for ACE labels after Punycode decoding.:check_bidi— defaulttrue. Whentrue, if the domain contains a right-to-left character, every label must satisfy the RFC 5893 bidi rule.:check_joiners— defaulttrue. Whentrue, labels containing ZWJ or ZWNJ must satisfy the CONTEXTJ rules of RFC 5892.:use_std3_ascii_rules— defaulttrue. Whentrue, ASCII characters in a label are restricted to letters, digits and hyphen.:verify_dns_length— defaulttrue. Whentrue, each label must be 1–63 octets and the full domain (less the trailing., if any) must be 1–253 octets.
Functions
@spec to_ascii(String.t() | [String.t()], options()) :: {:ok, String.t() | [String.t()]} | {:error, error()}
Applies UTS #46 ToASCII to a domain name.
Arguments
domainis either aString.t/0containing one or more labels separated by an IDNA label separator (.,U+3002,U+FF0E,U+FF61), or a list ofString.t/0labels. Each label may be in Unicode form or in ACE (xn--…) form.optionsis a keyword list of UTS #46 options. See the typeoptions/0.
Returns
{:ok, ascii_domain}on success. The shape mirrors the input: a string in returns a string (labels rejoined with.); a list in returns the list of ASCII labels.{:error, reason}— see theerror/0type.
Examples
iex> Unicode.IDNA.to_ascii("bücher.de")
{:ok, "xn--bcher-kva.de"}
iex> Unicode.IDNA.to_ascii("中文。中国")
{:ok, "xn--fiq228c.xn--fiqs8s"}
iex> Unicode.IDNA.to_ascii(["bücher", "de"])
{:ok, ["xn--bcher-kva", "de"]}
iex> Unicode.IDNA.to_ascii("ASCII")
{:ok, "ascii"}
iex> Unicode.IDNA.to_ascii("xn--bcher-kva")
{:ok, "xn--bcher-kva"}
iex> Unicode.IDNA.to_ascii("not_valid")
{:error, :disallowed}
iex> Unicode.IDNA.to_ascii("not_valid", use_std3_ascii_rules: false)
{:ok, "not_valid"}
@spec to_unicode(String.t() | [String.t()], options()) :: {:ok, String.t() | [String.t()]} | {:error, error()}
Applies UTS #46 ToUnicode to a domain name.
Arguments
domainis either aString.t/0or a list of labelString.t/0s; seeto_ascii/2for the shape semantics.optionsis a keyword list. Seeto_ascii/2.
Returns
{:ok, unicode_domain}on success — string in / string out, list in / list out.{:error, reason}on failure.
Examples
iex> Unicode.IDNA.to_unicode("xn--bcher-kva.de")
{:ok, "bücher.de"}
iex> Unicode.IDNA.to_unicode("BÜCHER.DE")
{:ok, "bücher.de"}
iex> Unicode.IDNA.to_unicode(["xn--bcher-kva", "de"])
{:ok, ["bücher", "de"]}
iex> Unicode.IDNA.to_unicode("xn--bcher-kva")
{:ok, "bücher"}
iex> Unicode.IDNA.to_unicode("bücher")
{:ok, "bücher"}
Returns true if label is a valid IDNA label under the given options, false otherwise.
Operates on a single label only. Equivalent to match?({:ok, _}, to_ascii(label, options)) for a binary input that does not contain a label separator.
Arguments
labelis aString.t/0containing one domain label.optionsis a keyword list. Seeto_ascii/2.
Returns
trueorfalse.
Examples
iex> Unicode.IDNA.valid_label?("bücher")
true
iex> Unicode.IDNA.valid_label?("not_valid")
false
iex> Unicode.IDNA.valid_label?("not_valid", use_std3_ascii_rules: false)
true