Training Neural Models Guide
View SourceThis guide provides detailed instructions for training neural models in Nasty, from data preparation to deployment.
Table of Contents
- Prerequisites
- Data Preparation
- Training POS Tagging Models
- Advanced Training Options
- Model Evaluation
- Troubleshooting
Prerequisites
System Requirements
- Memory: Minimum 4GB RAM for training, 8GB+ recommended
- CPU: Multi-core CPU (4+ cores recommended)
- GPU: Optional but highly recommended (10-100x speedup with EXLA)
- Storage: 500MB-2GB for models and training data
Dependencies
All neural dependencies are included in mix.exs:
{:axon, "~> 0.7"},
{:nx, "~> 0.9"},
{:exla, "~> 0.9"},
{:bumblebee, "~> 0.6"}Install with:
mix deps.get
Enable GPU Acceleration (Optional)
Set environment variable for EXLA to use GPU:
export XLA_TARGET=cuda120 # or cuda118, rocm, etc.
mix deps.compile
Data Preparation
CoNLL-U Format
Neural models train on CoNLL-U formatted data. Each sentence is separated by blank lines, with one token per line:
1 The the DET DT _ 2 det _ _
2 cat cat NOUN NN _ 3 subj _ _
3 sat sit VERB VBD _ 0 root _ _
1 Dogs dog NOUN NNS _ 2 subj _ _
2 run run VERB VBP _ 0 root _ _Columns (tab-separated):
- Index
- Word form
- Lemma
- UPOS tag (used for training)
- XPOS tag
- Features
- Head
- Dependency relation 9-10. Additional annotations
Where to Get Training Data
Universal Dependencies corpora:
- English: UD_English-EWT
- Spanish: UD_Spanish-GSD
- Catalan: UD_Catalan-AnCora
Download and extract:
cd data
git clone https://github.com/UniversalDependencies/UD_English-EWT
Data Split Recommendations
- Training: 80% (or use provided train split)
- Validation: 10% (or use provided dev split)
- Test: 10% (or use provided test split)
The training pipeline handles splitting automatically if you provide a single file.
Training POS Tagging Models
Quick Start - CLI Training
The easiest way to train is using the Mix task:
mix nasty.train.neural_pos \
--corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
--output models/pos_neural_v1.axon \
--epochs 10 \
--batch-size 32
CLI Options Reference
mix nasty.train.neural_pos [options]
Required:
--corpus PATH Path to CoNLL-U training corpus
Optional:
--output PATH Model save path (default: pos_neural.axon)
--validation PATH Path to validation corpus (auto-split if not provided)
--epochs N Number of training epochs (default: 10)
--batch-size N Batch size (default: 32)
--learning-rate F Learning rate (default: 0.001)
--hidden-size N LSTM hidden size (default: 256)
--embedding-dim N Word embedding dimension (default: 300)
--num-layers N Number of LSTM layers (default: 2)
--dropout F Dropout rate (default: 0.3)
--use-char-cnn Enable character CNN (default: enabled)
--char-embedding-dim N Character embedding dim (default: 50)
--optimizer NAME Optimizer: adam, sgd, adamw (default: adam)
--early-stopping N Early stopping patience (default: 3)
--checkpoint-dir PATH Save checkpoints during training
--min-freq N Min word frequency for vocab (default: 1)
--validation-split F Validation split fraction (default: 0.1)
Programmatic Training
For more control, train programmatically:
alias Nasty.Statistics.POSTagging.NeuralTagger
alias Nasty.Statistics.Neural.DataLoader
# Load training data
{:ok, sentences} = DataLoader.load_conllu_file("data/train.conllu")
# Split into train/validation
{train_data, valid_data} = DataLoader.split_data(sentences, validation_split: 0.1)
# Create and configure tagger
tagger = NeuralTagger.new(training_data: train_data)
# Train with custom options
{:ok, trained_tagger} = NeuralTagger.train(tagger, train_data,
epochs: 20,
batch_size: 32,
learning_rate: 0.001,
hidden_size: 512,
embedding_dim: 300,
num_lstm_layers: 3,
dropout: 0.5,
use_char_cnn: true,
validation_data: valid_data,
early_stopping_patience: 5
)
# Save trained model
:ok = NeuralTagger.save(trained_tagger, "models/pos_advanced.axon")Advanced Training Options
Hyperparameter Tuning
Hidden Size (--hidden-size):
- Small (128-256): Faster training, less memory, slightly lower accuracy
- Medium (256-512): Balanced performance (default: 256)
- Large (512-1024): Best accuracy, requires more memory/time
Embedding Dimension (--embedding-dim):
- Small (50-100): Fast, low memory
- Medium (300): Good balance (default, matches GloVe)
- Large (300-1024): For very large corpora
Number of LSTM Layers (--num-layers):
- 1 layer: Fast, simple patterns
- 2 layers: Balanced (default, recommended)
- 3+ layers: Complex patterns, risk overfitting
Dropout (--dropout):
- 0.0: No regularization (risk overfitting)
- 0.3: Good default
- 0.5: Strong regularization for small datasets
Batch Size (--batch-size):
- Small (8-16): Better generalization, slower
- Medium (32): Good balance (default)
- Large (64-128): Faster training, needs more memory
Character CNN Configuration
Character-level CNN helps with out-of-vocabulary words:
mix nasty.train.neural_pos \
--corpus data/train.conllu \
--use-char-cnn \
--char-embedding-dim 50 \
--char-vocab-size 150
Disable if training is too slow:
mix nasty.train.neural_pos \
--corpus data/train.conllu \
--no-char-cnn
Using Pre-trained Embeddings
Load GloVe embeddings for better initialization:
alias Nasty.Statistics.Neural.Embeddings
# Load GloVe vectors
glove_embeddings = Embeddings.load_glove("data/glove.6B.300d.txt", word_vocab)
# Train with pre-trained embeddings
{:ok, tagger} = NeuralTagger.train(base_tagger, train_data,
pretrained_embeddings: glove_embeddings,
freeze_embeddings: false # Allow fine-tuning
)Note: GloVe loading is currently a placeholder. Full implementation coming soon.
Optimizer Selection
Adam (default):
- Adaptive learning rates
- Works well out-of-the-box
- Good for most use cases
SGD:
- Simple, stable
- May need learning rate scheduling
- Good baseline
AdamW:
- Adam with weight decay
- Better generalization
- Recommended for large models
mix nasty.train.neural_pos \
--corpus data/train.conllu \
--optimizer adamw \
--learning-rate 0.0001
Early Stopping
Automatically stop training when validation performance plateaus:
mix nasty.train.neural_pos \
--corpus data/train.conllu \
--validation data/dev.conllu \
--early-stopping 5 # Stop after 5 epochs without improvement
Checkpointing
Save model checkpoints during training:
mix nasty.train.neural_pos \
--corpus data/train.conllu \
--checkpoint-dir checkpoints/ \
--checkpoint-frequency 2 # Save every 2 epochs
Checkpoints are named: checkpoint_epoch_001.axon, checkpoint_epoch_002.axon, etc.
Model Evaluation
During Training
The training task prints per-tag metrics:
Epoch 1/10
Loss: 0.456
Accuracy: 0.923
Per-tag accuracy:
NOUN: 0.957
VERB: 0.942
DET: 0.989
...Post-Training Evaluation
Evaluate on test set:
mix nasty.eval.neural_pos \
--model models/pos_neural_v1.axon \
--test data/en_ewt-ud-test.conllu
Or programmatically:
{:ok, model} = NeuralTagger.load("models/pos_neural_v1.axon")
{:ok, test_sentences} = DataLoader.load_conllu_file("data/test.conllu")
# Evaluate
correct = 0
total = 0
for {words, gold_tags} <- test_sentences do
{:ok, pred_tags} = NeuralTagger.predict(model, words, [])
correct = correct + Enum.count(Enum.zip(pred_tags, gold_tags), fn {p, g} -> p == g end)
total = total + length(gold_tags)
end
accuracy = correct / total
IO.puts("Accuracy: #{Float.round(accuracy * 100, 2)}%")Metrics to Track
- Overall Accuracy: Percentage of correctly tagged tokens
- Per-Tag Accuracy: Accuracy for each POS tag
- Per-Tag Precision/Recall: For detailed error analysis
- OOV Accuracy: Performance on out-of-vocabulary words
- Training Time: Total time and time per epoch
- Convergence: Number of epochs to best validation score
Troubleshooting
Out of Memory
Symptoms: Process crashes with memory error
Solutions:
- Reduce batch size:
--batch-size 16or--batch-size 8 - Reduce hidden size:
--hidden-size 128 - Reduce embedding dimension:
--embedding-dim 100 - Disable character CNN:
--no-char-cnn - Use smaller training corpus subset
Training Too Slow
Symptoms: Hours per epoch
Solutions:
- Enable EXLA GPU support (see Prerequisites)
- Increase batch size:
--batch-size 64 - Disable character CNN if not needed
- Use fewer LSTM layers:
--num-layers 1 - Reduce hidden size:
--hidden-size 128
Overfitting
Symptoms: High training accuracy, low validation accuracy
Solutions:
- Increase dropout:
--dropout 0.5 - Use more training data
- Enable early stopping:
--early-stopping 3 - Reduce model complexity (fewer layers, smaller hidden size)
- Add L2 regularization
Underfitting
Symptoms: Low training and validation accuracy
Solutions:
- Increase model capacity:
--hidden-size 512 --num-layers 3 - Train longer:
--epochs 20 - Lower dropout:
--dropout 0.2 - Increase learning rate:
--learning-rate 0.01 - Check data quality (wrong labels, formatting issues)
Validation Loss Not Decreasing
Symptoms: Validation loss stays flat or increases
Solutions:
- Lower learning rate:
--learning-rate 0.0001 - Add early stopping
- Check for data issues (train/validation overlap, different distributions)
- Try different optimizer:
--optimizer adamw
CoNLL-U Loading Errors
Symptoms: Parser errors, wrong tag counts
Solutions:
- Verify file format (tab-separated, 10 columns)
- Check for empty lines between sentences
- Ensure UTF-8 encoding
- Remove or fix malformed lines
- Validate with UD validator: https://universaldependencies.org/tools.html
Model Not Learning
Symptoms: Loss stays constant, accuracy at baseline
Solutions:
- Check data quality (are labels correct?)
- Verify vocabulary is being built correctly
- Increase learning rate:
--learning-rate 0.01 - Remove or reduce dropout initially
- Check for bugs in data preprocessing
Best Practices
For Small Datasets (<5K sentences)
mix nasty.train.neural_pos \
--corpus data/small_corpus.conllu \
--epochs 20 \
--batch-size 16 \
--hidden-size 128 \
--embedding-dim 100 \
--dropout 0.5 \
--early-stopping 5 \
--no-char-cnn
For Medium Datasets (5K-50K sentences)
mix nasty.train.neural_pos \
--corpus data/medium_corpus.conllu \
--epochs 15 \
--batch-size 32 \
--hidden-size 256 \
--embedding-dim 300 \
--dropout 0.3 \
--use-char-cnn \
--early-stopping 3
For Large Datasets (50K+ sentences)
mix nasty.train.neural_pos \
--corpus data/large_corpus.conllu \
--epochs 10 \
--batch-size 64 \
--hidden-size 512 \
--embedding-dim 300 \
--num-layers 3 \
--dropout 0.3 \
--use-char-cnn \
--optimizer adamw \
--learning-rate 0.0001
Production Deployment
After training, deploy your model:
Save the trained model:
# Model is already saved by training task ls -lh models/pos_neural_v1.axonLoad in production:
{:ok, model} = NeuralTagger.load("models/pos_neural_v1.axon")Integrate with POSTagger:
# Use neural mode {:ok, ast} = Nasty.parse(text, language: :en, model: :neural, neural_model: model) # Or use ensemble mode {:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble, neural_model: model)Monitor performance:
- Track accuracy on representative sample
- Monitor latency (should be <100ms per sentence on CPU)
- Watch memory usage
Next Steps
- Read NEURAL_MODELS.md for architecture details
- See PRETRAINED_MODELS.md for using Bumblebee transformers
- Check examples/ for complete training scripts
- Explore UD treebanks for more training data