Training Neural Models Guide

This guide provides detailed instructions for training neural models in Nasty, from data preparation to deployment.

Prerequisites
Data Preparation
Training POS Tagging Models
Advanced Training Options
Model Evaluation
Troubleshooting

Prerequisites

System Requirements

Memory: Minimum 4GB RAM for training, 8GB+ recommended
CPU: Multi-core CPU (4+ cores recommended)
GPU: Optional but highly recommended (10-100x speedup with EXLA)
Storage: 500MB-2GB for models and training data

Dependencies

All neural dependencies are included in mix.exs:

{:axon, "~> 0.7"},
{:nx, "~> 0.9"},
{:exla, "~> 0.9"},
{:bumblebee, "~> 0.6"}

Install with:

mix deps.get

Enable GPU Acceleration (Optional)

Set environment variable for EXLA to use GPU:

export XLA_TARGET=cuda120  # or cuda118, rocm, etc.
mix deps.compile

Data Preparation

CoNLL-U Format

Neural models train on CoNLL-U formatted data. Each sentence is separated by blank lines, with one token per line:

1	The	the	DET	DT	_	2	det	_	_
2	cat	cat	NOUN	NN	_	3	subj	_	_
3	sat	sit	VERB	VBD	_	0	root	_	_

1	Dogs	dog	NOUN	NNS	_	2	subj	_	_
2	run	run	VERB	VBP	_	0	root	_	_

Columns (tab-separated):

Index
Word form
Lemma
UPOS tag (used for training)
XPOS tag
Features
Head
Dependency relation 9-10. Additional annotations

Where to Get Training Data

Universal Dependencies corpora:

Download and extract:

cd data
git clone https://github.com/UniversalDependencies/UD_English-EWT

Data Split Recommendations

Training: 80% (or use provided train split)
Validation: 10% (or use provided dev split)
Test: 10% (or use provided test split)

The training pipeline handles splitting automatically if you provide a single file.

Training POS Tagging Models

Quick Start - CLI Training

The easiest way to train is using the Mix task:

mix nasty.train.neural_pos \
  --corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
  --output models/pos_neural_v1.axon \
  --epochs 10 \
  --batch-size 32

CLI Options Reference

mix nasty.train.neural_pos [options]

Required:
  --corpus PATH          Path to CoNLL-U training corpus

Optional:
  --output PATH          Model save path (default: pos_neural.axon)
  --validation PATH      Path to validation corpus (auto-split if not provided)
  --epochs N             Number of training epochs (default: 10)
  --batch-size N         Batch size (default: 32)
  --learning-rate F      Learning rate (default: 0.001)
  --hidden-size N        LSTM hidden size (default: 256)
  --embedding-dim N      Word embedding dimension (default: 300)
  --num-layers N         Number of LSTM layers (default: 2)
  --dropout F            Dropout rate (default: 0.3)
  --use-char-cnn         Enable character CNN (default: enabled)
  --char-embedding-dim N Character embedding dim (default: 50)
  --optimizer NAME       Optimizer: adam, sgd, adamw (default: adam)
  --early-stopping N     Early stopping patience (default: 3)
  --checkpoint-dir PATH  Save checkpoints during training
  --min-freq N           Min word frequency for vocab (default: 1)
  --validation-split F   Validation split fraction (default: 0.1)

Programmatic Training

For more control, train programmatically:

alias Nasty.Statistics.POSTagging.NeuralTagger
alias Nasty.Statistics.Neural.DataLoader

# Load training data
{:ok, sentences} = DataLoader.load_conllu_file("data/train.conllu")

# Split into train/validation
{train_data, valid_data} = DataLoader.split_data(sentences, validation_split: 0.1)

# Create and configure tagger
tagger = NeuralTagger.new(training_data: train_data)

# Train with custom options
{:ok, trained_tagger} = NeuralTagger.train(tagger, train_data,
  epochs: 20,
  batch_size: 32,
  learning_rate: 0.001,
  hidden_size: 512,
  embedding_dim: 300,
  num_lstm_layers: 3,
  dropout: 0.5,
  use_char_cnn: true,
  validation_data: valid_data,
  early_stopping_patience: 5
)

# Save trained model
:ok = NeuralTagger.save(trained_tagger, "models/pos_advanced.axon")

Advanced Training Options

Hyperparameter Tuning

Hidden Size (--hidden-size):

Small (128-256): Faster training, less memory, slightly lower accuracy
Medium (256-512): Balanced performance (default: 256)
Large (512-1024): Best accuracy, requires more memory/time

Embedding Dimension (--embedding-dim):

Small (50-100): Fast, low memory
Medium (300): Good balance (default, matches GloVe)
Large (300-1024): For very large corpora

Number of LSTM Layers (--num-layers):

1 layer: Fast, simple patterns
2 layers: Balanced (default, recommended)
3+ layers: Complex patterns, risk overfitting

Dropout (--dropout):

0.0: No regularization (risk overfitting)
0.3: Good default
0.5: Strong regularization for small datasets

Batch Size (--batch-size):

Small (8-16): Better generalization, slower
Medium (32): Good balance (default)
Large (64-128): Faster training, needs more memory

Character CNN Configuration

Character-level CNN helps with out-of-vocabulary words:

mix nasty.train.neural_pos \
  --corpus data/train.conllu \
  --use-char-cnn \
  --char-embedding-dim 50 \
  --char-vocab-size 150

Disable if training is too slow:

mix nasty.train.neural_pos \
  --corpus data/train.conllu \
  --no-char-cnn

Using Pre-trained Embeddings

Load GloVe embeddings for better initialization:

alias Nasty.Statistics.Neural.Embeddings

# Load GloVe vectors
glove_embeddings = Embeddings.load_glove("data/glove.6B.300d.txt", word_vocab)

# Train with pre-trained embeddings
{:ok, tagger} = NeuralTagger.train(base_tagger, train_data,
  pretrained_embeddings: glove_embeddings,
  freeze_embeddings: false  # Allow fine-tuning
)

Note: GloVe loading is currently a placeholder. Full implementation coming soon.

Optimizer Selection

Adam (default):

Adaptive learning rates
Works well out-of-the-box
Good for most use cases

SGD:

Simple, stable
May need learning rate scheduling
Good baseline

AdamW:

Adam with weight decay
Better generalization
Recommended for large models

mix nasty.train.neural_pos \
  --corpus data/train.conllu \
  --optimizer adamw \
  --learning-rate 0.0001

Early Stopping

Automatically stop training when validation performance plateaus:

mix nasty.train.neural_pos \
  --corpus data/train.conllu \
  --validation data/dev.conllu \
  --early-stopping 5  # Stop after 5 epochs without improvement

Checkpointing

Save model checkpoints during training:

mix nasty.train.neural_pos \
  --corpus data/train.conllu \
  --checkpoint-dir checkpoints/ \
  --checkpoint-frequency 2  # Save every 2 epochs

Checkpoints are named: checkpoint_epoch_001.axon, checkpoint_epoch_002.axon, etc.

Model Evaluation

During Training

The training task prints per-tag metrics:

Epoch 1/10
  Loss: 0.456
  Accuracy: 0.923
  
Per-tag accuracy:
  NOUN: 0.957
  VERB: 0.942
  DET: 0.989
  ...

Post-Training Evaluation

Evaluate on test set:

mix nasty.eval.neural_pos \
  --model models/pos_neural_v1.axon \
  --test data/en_ewt-ud-test.conllu

Or programmatically:

{:ok, model} = NeuralTagger.load("models/pos_neural_v1.axon")
{:ok, test_sentences} = DataLoader.load_conllu_file("data/test.conllu")

# Evaluate
correct = 0
total = 0

for {words, gold_tags} <- test_sentences do
  {:ok, pred_tags} = NeuralTagger.predict(model, words, [])
  
  correct = correct + Enum.count(Enum.zip(pred_tags, gold_tags), fn {p, g} -> p == g end)
  total = total + length(gold_tags)
end

accuracy = correct / total
IO.puts("Accuracy: #{Float.round(accuracy * 100, 2)}%")

Metrics to Track

Overall Accuracy: Percentage of correctly tagged tokens
Per-Tag Accuracy: Accuracy for each POS tag
Per-Tag Precision/Recall: For detailed error analysis
OOV Accuracy: Performance on out-of-vocabulary words
Training Time: Total time and time per epoch
Convergence: Number of epochs to best validation score

Troubleshooting

Out of Memory

Symptoms: Process crashes with memory error

Solutions:

Reduce batch size: --batch-size 16 or --batch-size 8
Reduce hidden size: --hidden-size 128
Reduce embedding dimension: --embedding-dim 100
Disable character CNN: --no-char-cnn
Use smaller training corpus subset

Training Too Slow

Symptoms: Hours per epoch

Solutions:

Enable EXLA GPU support (see Prerequisites)
Increase batch size: --batch-size 64
Disable character CNN if not needed
Use fewer LSTM layers: --num-layers 1
Reduce hidden size: --hidden-size 128

Overfitting

Symptoms: High training accuracy, low validation accuracy

Solutions:

Increase dropout: --dropout 0.5
Use more training data
Enable early stopping: --early-stopping 3
Reduce model complexity (fewer layers, smaller hidden size)
Add L2 regularization

Underfitting

Symptoms: Low training and validation accuracy

Solutions:

Increase model capacity: --hidden-size 512 --num-layers 3
Train longer: --epochs 20
Lower dropout: --dropout 0.2
Increase learning rate: --learning-rate 0.01
Check data quality (wrong labels, formatting issues)

Validation Loss Not Decreasing

Symptoms: Validation loss stays flat or increases

Solutions:

Lower learning rate: --learning-rate 0.0001
Add early stopping
Check for data issues (train/validation overlap, different distributions)
Try different optimizer: --optimizer adamw

CoNLL-U Loading Errors

Symptoms: Parser errors, wrong tag counts

Solutions:

Verify file format (tab-separated, 10 columns)
Check for empty lines between sentences
Ensure UTF-8 encoding
Remove or fix malformed lines
Validate with UD validator: https://universaldependencies.org/tools.html

Model Not Learning

Symptoms: Loss stays constant, accuracy at baseline

Solutions:

Check data quality (are labels correct?)
Verify vocabulary is being built correctly
Increase learning rate: --learning-rate 0.01
Remove or reduce dropout initially
Check for bugs in data preprocessing

Best Practices

For Small Datasets (<5K sentences)

mix nasty.train.neural_pos \
  --corpus data/small_corpus.conllu \
  --epochs 20 \
  --batch-size 16 \
  --hidden-size 128 \
  --embedding-dim 100 \
  --dropout 0.5 \
  --early-stopping 5 \
  --no-char-cnn

For Medium Datasets (5K-50K sentences)

mix nasty.train.neural_pos \
  --corpus data/medium_corpus.conllu \
  --epochs 15 \
  --batch-size 32 \
  --hidden-size 256 \
  --embedding-dim 300 \
  --dropout 0.3 \
  --use-char-cnn \
  --early-stopping 3

For Large Datasets (50K+ sentences)

mix nasty.train.neural_pos \
  --corpus data/large_corpus.conllu \
  --epochs 10 \
  --batch-size 64 \
  --hidden-size 512 \
  --embedding-dim 300 \
  --num-layers 3 \
  --dropout 0.3 \
  --use-char-cnn \
  --optimizer adamw \
  --learning-rate 0.0001

Production Deployment

After training, deploy your model:

Save the trained model:

# Model is already saved by training task
ls -lh models/pos_neural_v1.axon

Load in production:

{:ok, model} = NeuralTagger.load("models/pos_neural_v1.axon")

Integrate with POSTagger:

# Use neural mode
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural, neural_model: model)

# Or use ensemble mode
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble, neural_model: model)

Monitor performance:
- Track accuracy on representative sample
- Monitor latency (should be <100ms per sentence on CPU)
- Watch memory usage

Next Steps

Read NEURAL_MODELS.md for architecture details
See PRETRAINED_MODELS.md for using Bumblebee transformers
Check examples/ for complete training scripts
Explore UD treebanks for more training data

← Previous Page Neural Coreference Resolution

Next Page → Fine-tuning Transformers Guide

Training Neural Models Guide

Table of Contents

Prerequisites

System Requirements

Dependencies

Enable GPU Acceleration (Optional)

Data Preparation

CoNLL-U Format

Where to Get Training Data

Data Split Recommendations

Training POS Tagging Models

Quick Start - CLI Training

CLI Options Reference

Programmatic Training

Advanced Training Options

Hyperparameter Tuning

Character CNN Configuration

Using Pre-trained Embeddings

Optimizer Selection

Early Stopping

Checkpointing

Model Evaluation

During Training

Post-Training Evaluation

Metrics to Track

Troubleshooting

Out of Memory

Training Too Slow

Overfitting

Underfitting

Validation Loss Not Decreasing

CoNLL-U Loading Errors

Model Not Learning

Best Practices

For Small Datasets (<5K sentences)

For Medium Datasets (5K-50K sentences)

For Large Datasets (50K+ sentences)

Production Deployment

Next Steps