Resolves a language detection into a CLDR-canonical locale string.
fastText's lid.176 reports a bare language code ("en", "zh",
"sr"). For the wider Elixir localisation ecosystem to consume that
result it generally needs three pieces of information: the language,
the script (Hans vs Hant, Latn vs Cyrl, ...), and the
territory. This module assembles all three.
Inputs
The detection itself, which already carries the language and the text-derived script.
Optional
:regionand:scriptoverrides — typically wired to anAccept-Languageheader, an IP geolocation, or a user preference.
Resolution algorithm
Form a candidate BCP-47 tag from
{language, script_override OR detection.script, region_override}— omitting any unspecified piece.If
localizeis available (the optional dep is loaded), callLocalize.validate_locale/1to run CLDR's likely-subtags algorithm. This fills in the remaining pieces and produces a canonical locale id like"zh-Hans-CN".If
localizeis not available, fall back to a hand-rolled map of the most common language defaults (e.g."en"→"en-US","zh"→"zh-Hans-CN","pt"→"pt-BR"). The fallback set is deliberately conservative — it covers the languages most users will hit but does not pretend to span all 176 fastText labels.
Hans vs Hant
When the detected language is "zh" and the script signal indicates
:Hani (the generic Han atom from ScriptDetector), this module uses
the language tag's region/script preferences to pick Hans or Hant.
With localize present the choice flows through CLDR likely-subtags;
without it, the default for bare "zh" is Hans-CN.
Summary
Functions
Resolves a Detection into a canonical CLDR locale string.
Types
@type script() :: Text.Language.Classifier.Fasttext.ScriptDetector.script()
Functions
@spec resolve( Text.Language.Classifier.Fasttext.Detection.t(), keyword() ) :: {:ok, String.t()} | {:error, term()}
Resolves a Detection into a canonical CLDR locale string.
Arguments
detectionis aText.Language.Classifier.Fasttext.Detection.
Options
:region— overrides the territory inferred by likely-subtags. Useful when the caller has stronger evidence (Accept-Language, geolocation, user preference). An ISO 3166-1 alpha-2 code as either a binary or atom.:script— overrides the script inferred from the text. Useful when the caller knows better than codepoint-frequency analysis (e.g. a publisher tagging Traditional Chinese content explicitly).:fallback— controls behaviour when:localizeis not available or the language is not in the fallback map. Either:language_only(return just the language code) or:tag_with_script(include the script subtag if known). Defaults to:language_onlyto match the behaviour of fastText's own outputs.
Returns
{:ok, locale_string}— for example"en-Latn-US"or"zh-Hans-CN"when:localizeis available,"en-US"or"zh-Hans-CN"from the fallback table, or just"en"if the language is unknown to the fallback.{:error, reason}— when:localizeis loaded and rejects the candidate tag.
Examples
iex> alias Text.Language.Classifier.Fasttext.Detection
iex> det = %Detection{language: "en", confidence: 0.9, script: :Latn,
...> alternatives: [], text: "hello"}
iex> {:ok, locale} = Text.Language.Classifier.Fasttext.Locale.resolve(det)
iex> locale =~ "en"
true
iex> alias Text.Language.Classifier.Fasttext.Detection
iex> det = %Detection{language: "zh", confidence: 0.95, script: :Hani,
...> alternatives: [], text: "你好世界"}
iex> {:ok, locale} = Text.Language.Classifier.Fasttext.Locale.resolve(det)
iex> String.starts_with?(locale, "zh")
true