Nasty.Statistics.SequenceLabeling.CRF (Nasty v0.3.0)
View SourceConditional Random Field (CRF) for sequence labeling.
Implements linear-chain CRF with feature-based modeling for tasks like Named Entity Recognition (NER), POS tagging, etc.
Model
Linear-chain CRF models the conditional probability:
P(y|x) = exp(score(x, y)) / Z(x)Where:
score(x, y) = Σ feature_weights + Σ transition_weightsZ(x)is the partition function (normalizer)
Training
Uses forward-backward algorithm to compute gradients and gradient descent with momentum for optimization.
Prediction
Uses Viterbi algorithm to find the most likely label sequence.
Examples
# Training
model = CRF.new(labels: [:person, :gpe, :org, :none])
training_data = load_annotated_data()
{:ok, trained} = CRF.train(model, training_data, iterations: 100)
# Prediction
{:ok, labels} = CRF.predict(trained, tokens, [])
Summary
Functions
Loads a trained CRF model from disk.
Returns model metadata.
Creates a new untrained CRF model.
Predicts labels for a sequence of tokens using Viterbi decoding.
Saves the trained CRF model to disk.
Trains the CRF model on annotated sequence data.
Types
Functions
Loads a trained CRF model from disk.
Returns model metadata.
Creates a new untrained CRF model.
Options
:labels- List of possible labels (required):language- Language code (default::en)
@spec predict(t(), [Nasty.AST.Token.t()], keyword()) :: {:ok, [atom()]} | {:error, term()}
Predicts labels for a sequence of tokens using Viterbi decoding.
Parameters
model- Trained CRF modeltokens- List of%Token{}structsopts- Options (currently unused)
Returns
{:ok, labels} - Predicted label sequence
Saves the trained CRF model to disk.
Trains the CRF model on annotated sequence data.
Training Data Format
List of {tokens, labels} tuples where:
tokensis a list of%Token{}structslabelsis a list of label atoms (same length as tokens)
Options
:iterations- Maximum training iterations (default: 100):learning_rate- Initial learning rate (default: 0.1):regularization- L2 regularization strength (default: 1.0):method- Optimization method (:sgd,:momentum,:adagrad) (default::momentum):convergence_threshold- Gradient norm threshold (default: 0.01)
Returns
{:ok, trained_model} with learned feature and transition weights