Cldr v0.13.0 Cldr.Locale View Source

Functions to parse and normalize locale names into a structure locale represented by a Cldr.LanguageTag.

CLDR represents localisation data organized into locales, with each locale being identified by a locale name that is formatted according to RFC5646.

In practise, the CLDR data utilizes a simple subset of locale name formats being:

  • a Language code such as en or fr

  • a Language code and Tertitory code such as en-GB

  • a Language code and Script such as zh-Hant

  • and in only two cases a Language code, Territory code and Variant such as ca-ES-VALENCIA and en-US-POSIX.

The RFC defines a language tag as:

A language tag is composed from a sequence of one or more “subtags”, each of which refines or narrows the range of language identified by the overall tag. Subtags, in turn, are a sequence of alphanumeric characters (letters and digits), distinguished and separated from other subtags in a tag by a hyphen (“-“, [Unicode] U+002D)

Therefore Cldr uses the hyphen (“-“, [Unicode] U+002D) as the subtag separator. On certain platforms, including POSIX platforms, the subtag separator is a “_” (underscore) rather than a “-“ (hyphen). Where appropriate, Cldr will transliterate any underscore into a hyphen before parsing or processing.

Locale name validity

When validating a locale name, Cldr will attempt to match the requested locale name to a configured locale. Therefore Cldr.Locale.new/1 may return an {:ok, language_tag} tuple even when the locale returned does not exactly match the requested locale name. For example, the following attempts to create a locale matching the non-existent “english as spoken in Spain” local name. Here Cldr will match to the nearest configured locale, which in this case will be “en”.

iex> Cldr.Locale.new("en-ES")
{:ok,
 %Cldr.LanguageTag{canonical_locale_name: "en-Latn-ES", cldr_locale_name: "en",
  extensions: %{}, language: "en", locale: %{}, private_use: [],
  rbnf_locale_name: "en", requested_locale_name: "en-ES", script: "Latn",
  territory: "ES", transform: %{}, variant: nil}}

Matching locales to requested locale names

When attempting to match the requested locale name to a configured locale, Cldr attempt to match against a set of reductions in the following order and will return the first match:

  • language, script, territory, variant
  • language, territory, variant
  • language, script, variant
  • language, variant
  • language, script, territory
  • language, territory
  • language, script
  • language
  • requested locale name
  • nil

Therefore matching is tolerant of a request for unknown scripts, territories and variants. Only the requested language is a requirement to be matched to a configured locale.

Substitutions for Obsolete and Deprecated locale names

CLDR provides data to help manage the transition from obsolete or deprecated locale names to current names. For example, the following requests the locale name “mo” which is the deprecated code for “Moldovian”. The replacement code is “ro” (Romanian).

iex> Cldr.Locale.new("mo")
{:ok,
  %Cldr.LanguageTag{canonical_locale_name: "ro-Latn-MD",
   cldr_locale_name: "ro-MD", extensions: %{}, language: "ro",
   locale: %{}, private_use: [], rbnf_locale_name: "ro",
   requested_locale_name: "mo", script: "Latn", territory: "MD",
   transform: %{}, variant: nil}}

Likely subtags

CLDR also provides data to indetify the most likely subtags for a requested locale name. This data is based on the default content data, the population data, and the the suppress-script data in [BCP47]. It is heuristically derived, and may change over time. For example, when requesting the locale “en”, the following is returned:

iex> Cldr.Locale.new("en")
{:ok,
 %Cldr.LanguageTag{canonical_locale_name: "en-Latn-US", cldr_locale_name: "en",
  extensions: %{}, language: "en", locale: %{}, private_use: [],
  rbnf_locale_name: "en", requested_locale_name: "en", script: "Latn",
  territory: "US", transform: %{}, variant: nil}}

Showing that a the likely subtag for the script is “Latn” and the likely territory is “US”.

Using the example for Substitutions above, we can see the result of combining substitutions and likely subtags for locale name “mo” returns the current language code of “ro” as well as the likely territory code of “MD” (Moldova).

Unknown territory codes

Whilst Cldr is tolerant of invalid territory codes, it is also important that such invalid codes not shadow the potential replacement of deprecated codes nor the insertion of likely subtags. Therefore invalid territory codes are ignored during this process. For example requesting a locale name “en-XX” which requests the invalid territory “XX”, the following will be returned:

iex> Cldr.Locale.new("en-XX")
{:ok, %Cldr.LanguageTag{
  canonical_locale_name: "en-Latn-US",
  cldr_locale_name: "en",
  extensions: %{},
  language: "en",
  locale: %{},
  private_use: [],
  rbnf_locale_name: "en",
  requested_locale_name: "en",
  script: "Latn",
  territory: "US",
  transform: %{},
  variant: nil
}}

Link to this section Summary

Types

The name of a locale in a string format

Functions

Replace empty subtags within a Cldr.LanguageTag with the most likely subtag

Returns an error tuple for an invalid locale alias

Return a map of the known aliases for Language, Script and Territory

Return a map of the aliases for a given alias key and type

Parses a locale name and returns a Cldr.LanguageTag struct that represents a locale

Parses a locale name and returns a Cldr.LanguageTag struct that represents a locale or raises on error

Returns the map of likely subtags for a subset of available locale names

Returns the likely substags, as a Cldr.LanguageTag, for a given locale name

Returns an error tuple for an invalid locale

Return a locale name from a Cldr.LanguageTag

Return a locale name by combining language, script, territory and variant parameters

Normalize the casing of a locale name

Substitute deprectated subtags with a Cldr.LanguageTag with their non-deprecated alternatives

Link to this section Types

Link to this type locale_name() View Source
locale_name() :: String.t()

The name of a locale in a string format

Link to this section Functions

Link to this function add_likely_subtags(language_tag) View Source

Replace empty subtags within a Cldr.LanguageTag with the most likely subtag.

A subtag is called empty if it is a missing script or region subtag, or it is a base language subtag with the value und. In the description below, a subscript on a subtag x indicates which tag it is from: xs is in the source, xm is in a match, and xr is in the final result.

This operation is performed in the following way:

Lookup

Lookup each of the following in order, and stop on the first match:

  • languages-scripts-regions
  • languages-regions
  • languages-scripts
  • languages
  • und-scripts

Return

  • If there is no match,either return

    • an error value, or
    • the match for und
  • Otherwise there is a match = languagem-scriptm-regionm

  • Let xr = xs if xs is not empty, and xm otherwise.

  • Return the language tag composed of languager-scriptr-regionr + variants + extensions .

Example

iex> Cldr.Locale.add_likely_subtags Cldr.LanguageTag.parse!("zh-SG")
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "zh",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: "zh-sg",
  script: "Hans",
  territory: "SG",
  transform: %{},
  variant: nil
}
Link to this function alias_error(locale_name, alias_name) View Source
alias_error(Locale.locale_name() | Cldr.LanguageTag.t(), String.t()) :: {Cldr.UnknownLocaleError, String.t()}

Returns an error tuple for an invalid locale alias.

Return a map of the known aliases for Language, Script and Territory

Link to this function aliases(key, type) View Source
aliases(Locale.locale_name(), atom()) :: Map.t()

Return a map of the aliases for a given alias key and type

  • type is one of [:language, :region, :script, :variant, :zone]

  • key is the substitution key (a language, region, script, variant or zone)

Link to this function canonical_language_tag(locale_name) View Source
canonical_language_tag(locale_name() | Cldr.LanguageTag.t()) ::
  {:ok, Cldr.LanguageTag.t()} |
  {:error, {Cldr.InvalidLanguageTag, String.t()}}

Parses a locale name and returns a Cldr.LanguageTag struct that represents a locale.

Returns:

  • {:ok, language_tag} or

  • {:eror, reason}

Several steps are followed to produce a canonical language tag:

  1. The language tag is parsed in accordance with RFC5646

  2. Any language, script or region aliases are replaced. This will replace any obsolete elements with current versions

  3. If a territory or script is not specified, a default is provided using the CLDR information returned by Cldr.Locale.likely_subtags/1

  4. A Cldr locale name is selected that is the nearest fit to the requested locale.

Example

iex> Cldr.Locale.canonical_language_tag "en"
{
  :ok,
  %Cldr.LanguageTag{
    canonical_locale_name: "en-Latn-US",
    cldr_locale_name: "en",
    extensions: %{},
    language: "en",
    locale: %{},
    private_use: [],
    rbnf_locale_name: "en",
    requested_locale_name: "en",
    script: "Latn",
    territory: "US",
    transform: %{},
    variant: nil
  }
}
Link to this function canonical_language_tag!(language_tag) View Source
canonical_language_tag!(locale_name() | Cldr.LanguageTag.t()) ::
  Cldr.LanguageTag.t() |
  none()

Parses a locale name and returns a Cldr.LanguageTag struct that represents a locale or raises on error.

See Cldr.Locale.canonical_language_tag/1 for more information.

Link to this function likely_subtags() View Source
likely_subtags() :: Map.t()

Returns the map of likely subtags for a subset of available locale names.

Example

Cldr.Locale.likely_subtags
%{
  "bez" => %Cldr.LanguageTag{
    canonical_locale_name: nil,
    cldr_locale_name: nil,
    extensions: %{},
    language: "bez",
    locale: %{},
    private_use: [],
    rbnf_locale_name: nil,
    requested_locale_name: nil,
    script: "Latn",
    territory: "TZ",
    transform: %{},
    variant: nil
  },
  "fuf" => %Cldr.LanguageTag{
    canonical_locale_name: nil,
    cldr_locale_name: nil,
    extensions: %{},
    language: "fuf",
    locale: %{},
    private_use: [],
    rbnf_locale_name: nil,
    requested_locale_name: nil,
    script: "Latn",
    territory: "GN",
    transform: %{},
    variant: nil
  },
  ...
Link to this function likely_subtags(locale_name) View Source
likely_subtags(locale_name()) :: Cldr.LanguageTag.t()

Returns the likely substags, as a Cldr.LanguageTag, for a given locale name.

Examples

iex> Cldr.Locale.likely_subtags "en"
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "en",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: nil,
  script: "Latn",
  territory: "US",
  transform: %{},
  variant: nil
}

iex> Cldr.Locale.likely_subtags Cldr.Locale.new!("th")
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "th",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: nil,
  script: "Thai",
  territory: "TH",
  transform: %{},
  variant: nil
}
Link to this function locale_error(locale_name) View Source
locale_error(Locale.locale_name() | Cldr.LanguageTag.t()) :: {Cldr.UnknownLocaleError, String.t()}

Returns an error tuple for an invalid locale.

Examples

iex> Cldr.Locale.locale_error :invalid
{Cldr.UnknownLocaleError, "The locale :invalid is not known."}
Link to this function locale_name_from(language_tag) View Source
locale_name_from(Cldr.LanguageTag.t()) :: Locale.locale_namne()

Return a locale name from a Cldr.LanguageTag

Example

iex> Cldr.Locale.locale_name_from Cldr.Locale.new!("en")
"en-Latn-US"
Link to this function locale_name_from(language, script, territory, variant) View Source
locale_name_from(String.t() | nil, String.t() | nil, String.t() | nil, String.t() | nil) :: Locale.locale_name()

Return a locale name by combining language, script, territory and variant parameters

  • language, script, territory and variant are string representations, or nil, of the language subtags

Example

iex> Cldr.Locale.locale_name_from("en", "Latn", "001", nil)
"en-Latn-001"
Link to this function normalize_locale_name(locale_name) View Source
normalize_locale_name(locale_name()) :: locale_name()

Normalize the casing of a locale name.

Locale names are case insensitive but certain common casing is followed in practise:

  • lower case for a language
  • capital case for a script
  • upper case for a region/territory

Note this function is intended to support only the CLDR locale names which have a format that is a subset of the full langauge tag specification.

For proper parsing of local names and language tags, see Cldr.Locale.canonical_language_tag/1

Examples

iex> Cldr.Locale.normalize_locale_name "zh_hant"
"zh-Hant"

iex> Cldr.Locale.normalize_locale_name "en_us"
"en-US"

iex> Cldr.Locale.normalize_locale_name "EN"
"en"

iex> Cldr.Locale.normalize_locale_name "ca_es_valencia"
"ca-ES-VALENCIA"
Link to this function substitute_aliases(language_tag) View Source

Substitute deprectated subtags with a Cldr.LanguageTag with their non-deprecated alternatives.

  • Replace any deprecated subtags with their canonical values using the alias data. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as shsr_Latn or moro_MD. In such a case, the original script and/or region/territory are retained if there is one. Thus sh_Arab_AQsr_Arab_AQ, not sr_Latn_AQ.

  • Remove the script code ‘Zzzz’ and the territory code ‘ZZ’ if they occur.

  • Get the components of the cleaned-up source tag (languages, scripts, and regions/territories), plus any variants and extensions.

Examples

iex> Cldr.Locale.substitute_aliases Cldr.LanguageTag.Parser.parse!("en-US")
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "en",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: "en-us",
  script: nil,
  territory: "US",
  transform: %{},
  variant: nil
}

iex> Cldr.Locale.substitute_aliases Cldr.LanguageTag.Parser.parse!("sh_Arab_AQ")
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "sr",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: "sh-arab-aq",
  script: "Arab",
  territory: "AQ",
  transform: %{},
  variant: nil
}

iex> Cldr.Locale.substitute_aliases Cldr.LanguageTag.Parser.parse!("sh_AQ")
%Cldr.LanguageTag{
   canonical_locale_name: nil,
   cldr_locale_name: nil,
   extensions: %{},
   language: "sr",
   locale: %{},
   private_use: [],
   rbnf_locale_name: nil,
   requested_locale_name: "sh-aq",
   script: "Latn",
   territory: "AQ",
   transform: %{},
   variant: nil
 }

iex> Cldr.Locale.substitute_aliases Cldr.LanguageTag.Parser.parse!("mo")
%Cldr.LanguageTag{
  canonical_locale_name: nil,
  cldr_locale_name: nil,
  extensions: %{},
  language: "ro",
  locale: %{},
  private_use: [],
  rbnf_locale_name: nil,
  requested_locale_name: "mo",
  script: nil,
  territory: "MD",
  transform: %{},
  variant: nil
}