Text.Language.Classifier.Fasttext (Text v0.5.0)

Copy Markdown View Source

Pure-Elixir port of fastText's lid.176 language identification model.

This module is the public entry point for the fastText classifier. It glues together the lower-level pieces — ModelLoader, Features, Inference, ScriptDetector, Locale — into a small API for end users.

Loading a model

The lid.176.bin model file is approximately 126 MB and is not shipped with this package. Fetch it once after installing the library:

mix text.download_lid176

Then load it at application startup:

{:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load(
  Path.join(:code.priv_dir(:text), "lid_176/lid.176.bin")
)

Loaded models are immutable and safe to share across processes — the matrices live in Nx tensors backed by reference-counted refcs, so passing the struct between processes does not duplicate the 128 MB payload.

Detecting a language

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> {:ok, det} = Text.Language.Classifier.Fasttext.detect("Bonjour le monde", model)
iex> det.language
"fr"
iex> det.script
:Latn
iex> det.confidence > 0.9
true

Just the language code

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> Text.Language.Classifier.Fasttext.classify("Hola mundo", model)
{:ok, "es"}

Resolving to a CLDR locale

When the localize optional dependency is available, detections can be expanded into full CLDR-canonical locale strings via likely-subtags:

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> {:ok, det} = Text.Language.Classifier.Fasttext.detect("你好世界", model)
iex> {:ok, locale} = Text.Language.Classifier.Fasttext.to_locale(det)
iex> String.starts_with?(locale, "zh")
true

Without localize, a small built-in fallback table covers the most common languages.

Confidence and uncertainty

fastText assigns a probability to every label. For very short or ambiguous inputs the top-1 confidence may be modest. Callers that need to gate on confidence should inspect Detection.confidence directly:

case Text.Language.Classifier.Fasttext.detect(text, model) do
  {:ok, %{confidence: c, language: lang}} when c > 0.7 ->
    {:ok, lang}
  {:ok, _} ->
    {:uncertain, "confidence below threshold"}
end

Summary

Functions

Convenience wrapper that returns just the top-1 language code.

Runs fastText language identification on text and returns a detection struct with the language, script, confidence, and alternatives.

Resolves a Detection into a canonical CLDR locale string.

Functions

classify(text, model)

@spec classify(binary(), Text.Language.Classifier.Fasttext.Model.t()) ::
  {:ok, String.t()} | {:error, atom()}

Convenience wrapper that returns just the top-1 language code.

Arguments

Returns

  • {:ok, language} where language is a BCP-47 language subtag.

  • {:error, :empty_input} for empty inputs.

Examples

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> Text.Language.Classifier.Fasttext.classify("Привет мир", model)
{:ok, "ru"}

detect(text, model, options \\ [])

Runs fastText language identification on text and returns a detection struct with the language, script, confidence, and alternatives.

Arguments

Options

  • :k — number of top predictions to record. The first becomes the main detection; the rest become alternatives. Defaults to 5.

  • :threshold — drop predictions below this probability. Defaults to 0.0 (matches fastText's Python wrapper).

Returns

  • {:ok, detection} where detection is a Text.Language.Classifier.Fasttext.Detection struct.

  • {:error, :no_predictions} when the model produces no candidate at all (which only happens if :threshold is set high enough to drop every label). Empty or whitespace-only input is not an error — fastText still produces a low-confidence prediction in that case (matching the reference's Python wrapper).

Examples

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> {:ok, det} = Text.Language.Classifier.Fasttext.detect("Hello world", model)
iex> det.language
"en"

to_locale(detection, options \\ [])

@spec to_locale(
  Text.Language.Classifier.Fasttext.Detection.t(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}

Resolves a Detection into a canonical CLDR locale string.

Delegates to Text.Language.Classifier.Fasttext.Locale.resolve/2. See that module for the resolution algorithm and the available options.

Examples

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> {:ok, det} = Text.Language.Classifier.Fasttext.detect("Hola, ¿cómo estás?", model)
iex> {:ok, locale} = Text.Language.Classifier.Fasttext.to_locale(det, region: :MX)
iex> String.contains?(locale, "MX")
true