Text.Language.Classifier.Fasttext.Inference (Text v0.5.0)

Forward-pass scoring for fastText models.

Given a flat list of input-matrix row indices (produced by Text.Language.Classifier.Fasttext.Features.extract/2), this module computes a hidden vector and projects it to a list of {label, probability} pairs, sorted descending by probability.

Two output projections are supported:

:softmax — softmax(output_matrix · hidden). The standard form.
:hs (hierarchical softmax) — root-to-leaf DFS over the Huffman tree built at load time. Each internal node carries a learned vector in output_matrix[node - osz]; the score of a leaf is the sum of log(sigmoid(±dot)) decisions along its path. This is the projection lid.176 uses.

Mirrors Model::predict and HierarchicalSoftmaxLoss::dfs from the fastText source (src/model.cc, src/loss.cc).

Numerical conventions

fastText uses std_log(x) = log(x + 1e-5) instead of plain log(x) for numerical stability when probabilities approach zero. This module uses the same.

Top-k pruning during DFS matches the C++ heap-based approach: a branch is skipped once its accumulated score drops below the lowest score currently in the top-k buffer. For a small model like lid.176 (176 leaves) the speedup is modest, but it preserves bit-equivalence with the reference's traversal order.

Summary

Functions

compute_hidden(features, input_matrix)

Returns the hidden activation vector for a list of feature indices.

predict(text, model, options \\ [])

Convenience wrapper: tokenize, extract features, and predict in one step.

predict_features(features, model, options \\ [])

Predicts the top-k labels with probabilities for a feature index list.

Functions

compute_hidden(features, input_matrix)

@spec compute_hidden([non_neg_integer()], Nx.Tensor.t()) :: Nx.Tensor.t()

Returns the hidden activation vector for a list of feature indices.

Arguments

features is a list of input-matrix row indices, typically from Text.Language.Classifier.Fasttext.Features.extract/2.
input_matrix is model.input_matrix.

Returns

A 1-dimensional Nx.Tensor of length args.dim. Returns a zero vector when the feature list is empty.

predict(text, model, options \\ [])

@spec predict(binary(), Text.Language.Classifier.Fasttext.Model.t(), keyword()) :: [
  {String.t(), float()}
]

Convenience wrapper: tokenize, extract features, and predict in one step.

Arguments

text is a UTF-8 binary.
model is a loaded Text.Language.Classifier.Fasttext.Model.

Options

Same as predict_features/3.

Returns

[{label, probability}, ...], descending by probability.

Examples

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> [{label, _} | _] = Text.Language.Classifier.Fasttext.Inference.predict("Bonjour le monde", model, k: 3)
iex> label
"fr"

predict_features(features, model, options \\ [])

@spec predict_features(
  [non_neg_integer()],
  Text.Language.Classifier.Fasttext.Model.t(),
  keyword()
) :: [
  {String.t(), float()}
]

Predicts the top-k labels with probabilities for a feature index list.

Arguments

features is the flat feature index list.
model is a fully-loaded Text.Language.Classifier.Fasttext.Model.

Options

:k — number of top predictions to return. Defaults to 1.
:threshold — probability cutoff. Predictions below this are dropped. Defaults to 0.0 (matches the fastText Python wrapper default).

Returns

A list of {label, probability} pairs, sorted descending by probability. May be shorter than k if :threshold excludes candidates.

Examples

iex> {:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load("priv/lid_176/lid.176.bin")
iex> features = Text.Language.Classifier.Fasttext.Features.extract("hello world", model)
iex> [{label, _prob} | _] = Text.Language.Classifier.Fasttext.Inference.predict_features(features, model, k: 3)
iex> label
"en"