Text.Language.Classifier.Fasttext.Detection (Text v0.5.0)

Copy Markdown View Source

The result of running fastText language identification on a piece of text.

Fields

  • language is the BCP-47 language subtag fastText reports — typically a two-letter ISO 639-1 code ("en", "fr", "zh"), occasionally three-letter ISO 639-3 ("als", "yue").

  • confidence is the probability fastText assigns to the top-1 label, in [0.0, 1.0].

  • script is the dominant Unicode script of the input as detected by Text.Language.Classifier.Fasttext.ScriptDetector. One of the script atoms documented there.

  • alternatives is the rest of the top-K predictions (excluding the top-1), each a {language, probability} pair. May be empty when k == 1 was requested.

  • text is the original input. Kept on the struct so a downstream Text.Language.Classifier.Fasttext.Locale.resolve/2 call can use it to disambiguate Hans/Hant or other script-derived locale information.

Summary

Types

alternative()

@type alternative() :: {String.t(), float()}

t()

@type t() :: %Text.Language.Classifier.Fasttext.Detection{
  alternatives: [alternative()],
  confidence: float(),
  language: String.t(),
  script: Text.Language.Classifier.Fasttext.ScriptDetector.script(),
  text: String.t()
}