Nasty.Statistics.POSTagging.HMMTagger (Nasty v0.3.0)
View SourceHidden Markov Model (HMM) for Part-of-Speech tagging.
Uses Viterbi algorithm for decoding the most likely tag sequence. Implements trigram transitions for better context modeling.
Model Components
- Emission probabilities: P(word|tag) - likelihood of a word given a tag
- Transition probabilities: P(tagi|tag{i-1}, tag_{i-2}) - trigram model
- Initial probabilities: P(tag) at sentence start
- Smoothing: Add-k smoothing for unknown words and transitions
Training
# Train from POS-tagged sequences
model = HMMTagger.new()
training_data = [{["The", "cat"], [:det, :noun]}, ...]
{:ok, trained_model} = HMMTagger.train(model, training_data, [])Prediction
{:ok, tags} = HMMTagger.predict(model, ["The", "cat", "sat"], [])
# => [:det, :noun, :verb]@behaviour Nasty.Statistics.Model
Summary
Functions
Load a trained model from disk.
Get model metadata.
Create a new untrained HMM tagger.
Predict POS tags for a sequence of words using Viterbi algorithm.
Save the trained model to disk.
Train the HMM on POS-tagged sequences.
Types
Functions
Load a trained model from disk.
Get model metadata.
Create a new untrained HMM tagger.
Options
:smoothing_k- Smoothing constant (default: 0.001)
Predict POS tags for a sequence of words using Viterbi algorithm.
Parameters
model- Trained HMM modelwords- List of words to tagopts- Prediction options (currently unused)
Returns
{:ok, tags}- Most likely tag sequence
Save the trained model to disk.
Train the HMM on POS-tagged sequences.
Parameters
model- Untrained or partially trained modeltraining_data- List of{words, tags}tuplesopts- Training options (currently unused)
Returns
{:ok, trained_model}- Model with learned probabilities