Text.Language.Classifier.Fasttext.Model (Text v0.5.0)

Copy Markdown View Source

A fully-loaded fastText model.

Holds the parsed args and dictionary plus the input and output matrices as Nx tensors. The input matrix is the largest piece of memory in the struct: for lid.176.bin it is a {nwords + bucket, dim} tensor of float32 values, approximately 128 MB.

Models are produced by Text.Language.Classifier.Fasttext.ModelLoader.load/2 from a fastText .bin file.

Fields

  • args is the parsed Text.Language.Classifier.Fasttext.Args struct.

  • dictionary is the parsed Text.Language.Classifier.Fasttext.Dictionary struct.

  • input_matrix is an Nx tensor of shape {args.bucket + dictionary.nwords, args.dim} holding the row-major input embeddings (word rows first, then subword n-gram rows).

  • output_matrix is an Nx tensor of shape {dictionary.nlabels, args.dim} holding the per-label output vectors.

  • labels is the list of language label strings (with the __label__ prefix stripped) in row order matching output_matrix. For lid.176 this is a 176-element list such as ["en", "zh-Hans", ...].

Summary

Types

Loss-specific decoding state, built once at load time and reused for every prediction.

t()

Types

loss_state()

@type loss_state() :: Text.Language.Classifier.Fasttext.HuffmanTree.t() | nil

Loss-specific decoding state, built once at load time and reused for every prediction.

  • For :hs (hierarchical softmax) the state is a Text.Language.Classifier.Fasttext.HuffmanTree constructed from the label counts.

  • For :softmax no extra state is needed; the field is nil.

  • :ns and :ova are not yet supported by the inference path.

t()

@type t() :: %Text.Language.Classifier.Fasttext.Model{
  args: Text.Language.Classifier.Fasttext.Args.t(),
  dictionary: Text.Language.Classifier.Fasttext.Dictionary.t(),
  input_matrix: Nx.Tensor.t(),
  labels: [String.t()],
  loss_state: loss_state(),
  output_matrix: Nx.Tensor.t()
}