Nasty.Statistics.POSTagging.HMMTagger (Nasty v0.3.0)

View Source

Hidden Markov Model (HMM) for Part-of-Speech tagging.

Uses Viterbi algorithm for decoding the most likely tag sequence. Implements trigram transitions for better context modeling.

Model Components

  • Emission probabilities: P(word|tag) - likelihood of a word given a tag
  • Transition probabilities: P(tagi|tag{i-1}, tag_{i-2}) - trigram model
  • Initial probabilities: P(tag) at sentence start
  • Smoothing: Add-k smoothing for unknown words and transitions

Training

# Train from POS-tagged sequences
model = HMMTagger.new()
training_data = [{["The", "cat"], [:det, :noun]}, ...]
{:ok, trained_model} = HMMTagger.train(model, training_data, [])

Prediction

{:ok, tags} = HMMTagger.predict(model, ["The", "cat", "sat"], [])
# => [:det, :noun, :verb]

@behaviour Nasty.Statistics.Model

Summary

Functions

Load a trained model from disk.

Get model metadata.

Create a new untrained HMM tagger.

Predict POS tags for a sequence of words using Viterbi algorithm.

Save the trained model to disk.

Train the HMM on POS-tagged sequences.

Types

t()

@type t() :: %Nasty.Statistics.POSTagging.HMMTagger{
  emission_probs: map(),
  initial_probs: map(),
  metadata: map(),
  smoothing_k: float(),
  tag_set: MapSet.t(),
  transition_probs: map(),
  vocabulary: MapSet.t()
}

Functions

load(path)

@spec load(Path.t()) :: {:ok, t()} | {:error, term()}

Load a trained model from disk.

metadata(model)

@spec metadata(t()) :: map()

Get model metadata.

new(opts \\ [])

@spec new(keyword()) :: t()

Create a new untrained HMM tagger.

Options

  • :smoothing_k - Smoothing constant (default: 0.001)

predict(model, words, opts \\ [])

@spec predict(t(), [String.t()], keyword()) :: {:ok, [atom()]} | {:error, term()}

Predict POS tags for a sequence of words using Viterbi algorithm.

Parameters

  • model - Trained HMM model
  • words - List of words to tag
  • opts - Prediction options (currently unused)

Returns

  • {:ok, tags} - Most likely tag sequence

save(model, path)

@spec save(t(), Path.t()) :: :ok | {:error, term()}

Save the trained model to disk.

train(model, training_data, opts \\ [])

@spec train(t(), [{[String.t()], [atom()]}], keyword()) ::
  {:ok, t()} | {:error, term()}

Train the HMM on POS-tagged sequences.

Parameters

  • model - Untrained or partially trained model
  • training_data - List of {words, tags} tuples
  • opts - Training options (currently unused)

Returns

  • {:ok, trained_model} - Model with learned probabilities