Nasty.Statistics.SequenceLabeling.Viterbi (Nasty v0.3.0)

Viterbi algorithm for sequence labeling with linear-chain CRFs.

Finds the most likely label sequence given feature weights and transition scores using dynamic programming.

Algorithm

Initialize scores for first position
For each subsequent position:
- Compute emission score (from features)
- Compute transition score (from previous label)
- Keep track of best previous label (backpointer)
Backtrack from best final label to reconstruct sequence

Complexity

Time: O(n × L²) where n = sequence length, L = number of labels Space: O(n × L)

Summary

Types

feature_vector()

label()

label_sequence()

score()

Functions

backward_probabilities(feature_sequence, feature_weights, transition_weights, labels)

Computes backward probabilities (for training).

decode(feature_sequence, feature_weights, transition_weights, labels, opts \\ [])

Decodes the most likely label sequence using Viterbi algorithm.

emission_score(features, label, feature_weights, log_domain \\ true)

Computes emission score for a label given features.

forward_probabilities(feature_sequence, feature_weights, transition_weights, labels)

Computes forward probabilities (for training).

partition_function(forward, labels, final_pos)

Computes partition function Z(x) for normalization.

transition_score(prev_label, curr_label, transition_weights, log_domain \\ true)

Computes transition score between two labels.

Types

feature_vector()

@type feature_vector() :: [String.t()]

label()

@type label() :: atom()

label_sequence()

@type label_sequence() :: [label()]

score()

@type score() :: float()

Functions

backward_probabilities(feature_sequence, feature_weights, transition_weights, labels)

@spec backward_probabilities([feature_vector()], map(), map(), [label()]) :: map()

Computes backward probabilities (for training).

Returns map of {position, label} → backward probability.

decode(feature_sequence, feature_weights, transition_weights, labels, opts \\ [])

@spec decode([feature_vector()], map(), map(), [label()], keyword()) ::
  {:ok, label_sequence(), score()}

Decodes the most likely label sequence using Viterbi algorithm.

Parameters

feature_sequence - List of feature vectors (one per token)
feature_weights - Map of feature → label → weight
transition_weights - Map of {prev_label, curr_label} → weight
labels - List of all possible labels
opts - Options:
- :log_domain - Use log probabilities (default: true)

Returns

{:ok, label_sequence, score} - Best label sequence and its score

emission_score(features, label, feature_weights, log_domain \\ true)

@spec emission_score(feature_vector(), atom(), map(), boolean()) :: score()

Computes emission score for a label given features.

Sum of all feature weights for features present in the feature vector.

forward_probabilities(feature_sequence, feature_weights, transition_weights, labels)

@spec forward_probabilities([feature_vector()], map(), map(), [label()]) :: map()

Computes forward probabilities (for training).

Used in CRF training to compute feature expectations.

Returns map of {position, label} → forward probability.

partition_function(forward, labels, final_pos)

@spec partition_function(map(), [label()], non_neg_integer()) :: float()

Computes partition function Z(x) for normalization.

Z(x) = sum over all possible label sequences of exp(score(x, y))

transition_score(prev_label, curr_label, transition_weights, log_domain \\ true)

@spec transition_score(atom() | nil, atom(), map(), boolean()) :: score()

Computes transition score between two labels.