Nasty.Statistics.POSTagging.NeuralTagger (Nasty v0.3.0)

Neural POS tagger using BiLSTM-CRF architecture.

Achieves 97-98% accuracy on standard benchmarks (Penn Treebank, Universal Dependencies). Uses bidirectional LSTM with optional CRF layer and character-level CNN.

Usage

# Training
tagger = NeuralTagger.new(vocab_size: 10000, num_tags: 17)
training_data = [{["The", "cat"], [:det, :noun]}, ...]
{:ok, trained} = NeuralTagger.train(tagger, training_data, epochs: 10)

# Prediction
{:ok, tags} = NeuralTagger.predict(trained, ["The", "cat", "sat"], [])
# => {:ok, [:det, :noun, :verb]}

# Persistence
NeuralTagger.save(trained, "priv/models/en/pos_neural_v1.axon")
{:ok, loaded} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon")

Integration with Existing Pipeline

The neural tagger integrates seamlessly with the existing POS tagging pipeline:

# In POSTagger.tag_pos/2
case model_type do
  :neural -> NeuralTagger.predict(model, words, [])
  :hmm -> HMMTagger.predict(model, words, [])
  :rule_based -> tag_pos_rule_based(tokens)
end

Summary

Types

t()

Functions

load(path)

Loads a trained model from disk.

metadata(tagger)

Returns model metadata.

new(opts \\ [])

Creates a new untrained neural POS tagger.

predict(tagger, words, opts \\ [])

Predicts POS tags for a sequence of words.

save(tagger, path)

Saves the trained model to disk.

train(tagger, training_data, opts \\ [])

Trains the neural POS tagger on annotated data.

Types

t()

@type t() :: %Nasty.Statistics.POSTagging.NeuralTagger{
  architecture_opts: keyword(),
  axon_model: Axon.t(),
  embeddings: Nasty.Statistics.Neural.Embeddings.embeddings() | nil,
  metadata: map(),
  model_state: map() | nil,
  tag_vocab: map(),
  vocab: Nasty.Statistics.Neural.Embeddings.vocabulary()
}

Functions

load(path)

@spec load(Path.t()) :: {:ok, t()} | {:error, term()}

Loads a trained model from disk.

Parameters

path - File path to load from

Returns

{:ok, tagger} - Loaded model
{:error, reason} - Load failed

metadata(tagger)

@spec metadata(t()) :: map()

Returns model metadata.

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new untrained neural POS tagger.

Options

:vocab_size - Vocabulary size (required if :vocab not provided)
:num_tags - Number of POS tags (required if :tag_vocab not provided)
:vocab - Pre-built vocabulary (optional)
:tag_vocab - Pre-built tag vocabulary (optional)
:embedding_dim - Embedding dimension (default: 300)
:hidden_size - LSTM hidden size (default: 256)
:num_layers - Number of BiLSTM layers (default: 2)
:dropout - Dropout rate (default: 0.3)
:use_char_cnn - Use character-level CNN (default: false)
:pretrained_embeddings - Path to GloVe embeddings (default: nil)

Returns

Untrained NeuralTagger struct.

predict(tagger, words, opts \\ [])

@spec predict(t(), [String.t()], keyword()) :: {:ok, [atom()]} | {:error, term()}

Predicts POS tags for a sequence of words.

Parameters

tagger - Trained neural tagger
words - List of words to tag
opts - Prediction options

Returns

{:ok, tags} - Predicted POS tags (list of atoms)
{:error, reason} - Prediction error

save(tagger, path)

@spec save(t(), Path.t()) :: :ok | {:error, term()}

Saves the trained model to disk.

Saves both the Axon model architecture and trained parameters, along with vocabulary and metadata.

Parameters

tagger - Trained tagger
path - File path (e.g., "priv/models/en/pos_neural_v1.axon")

Returns

:ok - Successfully saved
{:error, reason} - Save failed

train(tagger, training_data, opts \\ [])

@spec train(t(), [{[String.t()], [atom()]}], keyword()) ::
  {:ok, t()} | {:error, term()}

Trains the neural POS tagger on annotated data.

Parameters

tagger - Untrained or partially trained tagger
training_data - List of {words, tags} tuples
opts - Training options

Training Options

:epochs - Number of training epochs (default: 10)
:batch_size - Batch size (default: 32)
:learning_rate - Learning rate (default: 0.001)
:validation_split - Validation split ratio (default: 0.1)
:early_stopping - Early stopping config (default: [patience: 3])
:checkpoint_dir - Checkpoint directory (default: nil)

Returns

{:ok, trained_tagger} - Trained model
{:error, reason} - Training error