Nasty.Statistics.POSTagging.HMMTagger (Nasty v0.3.0)

Hidden Markov Model (HMM) for Part-of-Speech tagging.

Uses Viterbi algorithm for decoding the most likely tag sequence. Implements trigram transitions for better context modeling.

Model Components

Emission probabilities: P(word|tag) - likelihood of a word given a tag
Transition probabilities: P(tagi|tag{i-1}, tag_{i-2}) - trigram model
Initial probabilities: P(tag) at sentence start
Smoothing: Add-k smoothing for unknown words and transitions

Training

# Train from POS-tagged sequences
model = HMMTagger.new()
training_data = [{["The", "cat"], [:det, :noun]}, ...]
{:ok, trained_model} = HMMTagger.train(model, training_data, [])

Prediction

{:ok, tags} = HMMTagger.predict(model, ["The", "cat", "sat"], [])
# => [:det, :noun, :verb]

@behaviour Nasty.Statistics.Model

Summary

Types

t()

Functions

load(path)

Load a trained model from disk.

metadata(model)

Get model metadata.

new(opts \\ [])

Create a new untrained HMM tagger.

predict(model, words, opts \\ [])

Predict POS tags for a sequence of words using Viterbi algorithm.

save(model, path)

Save the trained model to disk.

train(model, training_data, opts \\ [])

Train the HMM on POS-tagged sequences.