Nasty.Statistics.POSTagging.NeuralTagger (Nasty v0.3.0)

View Source

Neural POS tagger using BiLSTM-CRF architecture.

Achieves 97-98% accuracy on standard benchmarks (Penn Treebank, Universal Dependencies). Uses bidirectional LSTM with optional CRF layer and character-level CNN.

Usage

# Training
tagger = NeuralTagger.new(vocab_size: 10000, num_tags: 17)
training_data = [{["The", "cat"], [:det, :noun]}, ...]
{:ok, trained} = NeuralTagger.train(tagger, training_data, epochs: 10)

# Prediction
{:ok, tags} = NeuralTagger.predict(trained, ["The", "cat", "sat"], [])
# => {:ok, [:det, :noun, :verb]}

# Persistence
NeuralTagger.save(trained, "priv/models/en/pos_neural_v1.axon")
{:ok, loaded} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon")

Integration with Existing Pipeline

The neural tagger integrates seamlessly with the existing POS tagging pipeline:

# In POSTagger.tag_pos/2
case model_type do
  :neural -> NeuralTagger.predict(model, words, [])
  :hmm -> HMMTagger.predict(model, words, [])
  :rule_based -> tag_pos_rule_based(tokens)
end

Summary

Functions

Loads a trained model from disk.

Returns model metadata.

Creates a new untrained neural POS tagger.

Predicts POS tags for a sequence of words.

Saves the trained model to disk.

Trains the neural POS tagger on annotated data.

Types

t()

@type t() :: %Nasty.Statistics.POSTagging.NeuralTagger{
  architecture_opts: keyword(),
  axon_model: Axon.t(),
  embeddings: Nasty.Statistics.Neural.Embeddings.embeddings() | nil,
  metadata: map(),
  model_state: map() | nil,
  tag_vocab: map(),
  vocab: Nasty.Statistics.Neural.Embeddings.vocabulary()
}

Functions

load(path)

@spec load(Path.t()) :: {:ok, t()} | {:error, term()}

Loads a trained model from disk.

Parameters

  • path - File path to load from

Returns

  • {:ok, tagger} - Loaded model
  • {:error, reason} - Load failed

metadata(tagger)

@spec metadata(t()) :: map()

Returns model metadata.

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new untrained neural POS tagger.

Options

  • :vocab_size - Vocabulary size (required if :vocab not provided)
  • :num_tags - Number of POS tags (required if :tag_vocab not provided)
  • :vocab - Pre-built vocabulary (optional)
  • :tag_vocab - Pre-built tag vocabulary (optional)
  • :embedding_dim - Embedding dimension (default: 300)
  • :hidden_size - LSTM hidden size (default: 256)
  • :num_layers - Number of BiLSTM layers (default: 2)
  • :dropout - Dropout rate (default: 0.3)
  • :use_char_cnn - Use character-level CNN (default: false)
  • :pretrained_embeddings - Path to GloVe embeddings (default: nil)

Returns

Untrained NeuralTagger struct.

predict(tagger, words, opts \\ [])

@spec predict(t(), [String.t()], keyword()) :: {:ok, [atom()]} | {:error, term()}

Predicts POS tags for a sequence of words.

Parameters

  • tagger - Trained neural tagger
  • words - List of words to tag
  • opts - Prediction options

Returns

  • {:ok, tags} - Predicted POS tags (list of atoms)
  • {:error, reason} - Prediction error

save(tagger, path)

@spec save(t(), Path.t()) :: :ok | {:error, term()}

Saves the trained model to disk.

Saves both the Axon model architecture and trained parameters, along with vocabulary and metadata.

Parameters

  • tagger - Trained tagger
  • path - File path (e.g., "priv/models/en/pos_neural_v1.axon")

Returns

  • :ok - Successfully saved
  • {:error, reason} - Save failed

train(tagger, training_data, opts \\ [])

@spec train(t(), [{[String.t()], [atom()]}], keyword()) ::
  {:ok, t()} | {:error, term()}

Trains the neural POS tagger on annotated data.

Parameters

  • tagger - Untrained or partially trained tagger
  • training_data - List of {words, tags} tuples
  • opts - Training options

Training Options

  • :epochs - Number of training epochs (default: 10)
  • :batch_size - Batch size (default: 32)
  • :learning_rate - Learning rate (default: 0.001)
  • :validation_split - Validation split ratio (default: 0.1)
  • :early_stopping - Early stopping config (default: [patience: 3])
  • :checkpoint_dir - Checkpoint directory (default: nil)

Returns

  • {:ok, trained_tagger} - Trained model
  • {:error, reason} - Training error