Neural Models in Nasty
View SourceComplete guide to using neural network models in Nasty for state-of-the-art NLP performance.
Overview
Nasty integrates neural network models using Axon, Elixir's neural network library, providing:
- BiLSTM-CRF architecture for sequence tagging (POS, NER)
- 97-98% accuracy on standard POS tagging benchmarks
- EXLA JIT compilation for 10-100x speedup
- Seamless integration with existing pipeline
- Pre-trained embedding support (GloVe, FastText)
- Model persistence and loading
- Graceful fallbacks to HMM and rule-based models
Quick Start
Installation
Neural dependencies are already included in mix.exs:
# Already added
{:axon, "~> 0.7"}, # Neural networks
{:nx, "~> 0.9"}, # Numerical computing
{:exla, "~> 0.9"}, # XLA compiler (GPU/CPU acceleration)
{:bumblebee, "~> 0.6"}, # Pre-trained models
{:tokenizers, "~> 0.5"} # Fast tokenizationBasic Usage
# Parse text with neural POS tagger
{:ok, ast} = Nasty.parse("The cat sat on the mat.",
language: :en,
model: :neural
)
# Tokens will have POS tags predicted by neural modelTraining Your Own Model
# Download Universal Dependencies corpus
# https://universaldependencies.org/
# Train neural POS tagger
mix nasty.train.neural_pos \
--corpus data/en_ewt-ud-train.conllu \
--test-corpus data/en_ewt-ud-test.conllu \
--epochs 10 \
--hidden-size 256
# Model saved to priv/models/en/pos_neural_v1.axon
Using Trained Models
alias Nasty.Statistics.POSTagging.NeuralTagger
# Load model
{:ok, model} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon")
# Predict
words = ["The", "cat", "sat"]
{:ok, tags} = NeuralTagger.predict(model, words, [])
# => {:ok, [:det, :noun, :verb]}Architecture
BiLSTM-CRF
The default architecture is Bidirectional LSTM with CRF (Conditional Random Field):
flowchart TD
A[Input Words]
B["Word Embeddings (300d)"]
C["BiLSTM Layer 1 (256 hidden units)"]
D["Dropout (0.3)"]
E["BiLSTM Layer 2 (256 hidden units)"]
F["Dense Projection → POS Tags"]
G[Softmax/CRF]
H[Output Tags]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> HKey Features:
- Bidirectional context (forward + backward)
- Optional character-level CNN for OOV handling
- Dropout regularization
- 2-3 LSTM layers (configurable)
- 256-512 hidden units (configurable)
Performance
Accuracy:
- POS Tagging: 97-98% (vs 95% HMM, 85% rule-based)
- NER: 88-92% F1 (future)
- Dependency Parsing: 94-96% UAS (future)
Speed (on UD English, 12k sentences):
- CPU: ~30-60 minutes training
- GPU (EXLA): ~5-10 minutes training
- Inference: ~1000-5000 tokens/second (CPU)
- Inference: ~10000+ tokens/second (GPU)
Model Integration Modes
Nasty provides multiple integration modes:
1. Neural Only (:neural)
Uses only the neural model:
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural)Fallback: If neural model unavailable, falls back to HMM → rule-based.
2. Neural Ensemble (:neural_ensemble)
Combines neural + HMM + rule-based:
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble)Strategy:
- Use rule-based for punctuation and numbers (high confidence)
- Use neural predictions for content words
- Best accuracy overall
3. Traditional Modes
Still available:
:rule_based- Fast, 85% accuracy:hmm- 95% accuracy:ensemble- HMM + rules
Training Guide
1. Prepare Data
Download Universal Dependencies corpus:
# English
wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu
# Or other languages
# Spanish, Catalan, etc.
2. Train Model
mix nasty.train.neural_pos \
--corpus en_ewt-ud-train.conllu \
--test-corpus en_ewt-ud-test.conllu \
--output priv/models/en/pos_neural_v1.axon \
--epochs 10 \
--batch-size 32 \
--learning-rate 0.001 \
--hidden-size 256 \
--num-layers 2 \
--dropout 0.3 \
--use-char-cnn false
3. Evaluate
The training task automatically evaluates on test set and reports:
- Overall accuracy
- Per-tag precision, recall, F1
- Confusion matrix (if requested)
4. Deploy
Models are automatically saved with:
- Model weights (
.axonfile) - Metadata (
.meta.jsonfile) - Vocabulary and tag mappings
Load via ModelLoader.load_latest(:en, :pos_tagging_neural) or directly with NeuralTagger.load/1.
Programmatic Training
alias Nasty.Statistics.POSTagging.NeuralTagger
alias Nasty.Statistics.Neural.DataLoader
# Load corpus
{:ok, sentences} = DataLoader.load_conllu("train.conllu")
# Split data
{train, valid} = DataLoader.split(sentences, [0.9, 0.1])
# Build vocabularies
{:ok, vocab, tag_vocab} = DataLoader.build_vocabularies(train, min_freq: 2)
# Create model
tagger = NeuralTagger.new(
vocab: vocab,
tag_vocab: tag_vocab,
embedding_dim: 300,
hidden_size: 256,
num_layers: 2,
dropout: 0.3
)
# Train
{:ok, trained} = NeuralTagger.train(tagger, train,
epochs: 10,
batch_size: 32,
learning_rate: 0.001,
validation_split: 0.1
)
# Save
NeuralTagger.save(trained, "my_model.axon")Pre-trained Embeddings
Using GloVe
alias Nasty.Statistics.Neural.Embeddings
# Load GloVe embeddings
{:ok, embeddings} = Embeddings.load_glove("glove.6B.300d.txt", vocab)
# Use during training
tagger = NeuralTagger.new(
vocab: vocab,
tag_vocab: tag_vocab,
pretrained_embeddings: embeddings
)Download GloVe:
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
Advanced Features
Character-Level CNN
For better OOV handling:
mix nasty.train.neural_pos \
--corpus train.conllu \
--use-char-cnn \
--char-filters 3,4,5 \
--char-num-filters 30
Custom Architectures
Extend Nasty.Statistics.Neural.Architectures.BiLSTMCRF:
defmodule MyArchitecture do
def build(opts) do
# Custom Axon model
Axon.input("tokens")
|> Axon.embedding(opts[:vocab_size], opts[:embedding_dim])
|> # ... your architecture
end
endStreaming Training
For large datasets:
DataLoader.stream_batches("huge_corpus.conllu", vocab, tag_vocab, batch_size: 64)
|> Stream.take(1000) # Process in chunks
|> Enum.each(&train_batch/1)Troubleshooting
EXLA Compilation Issues
If EXLA fails to compile:
# Install XLA dependencies
# Ubuntu/Debian:
sudo apt-get install build-essential
# Set compiler flags
export ELIXIR_ERL_OPTIONS="+fnu"
mix deps.clean exla --build
mix deps.get
Out of Memory
Reduce batch size:
mix nasty.train.neural_pos --batch-size 16 # Instead of 32
Or use gradient accumulation:
# In training opts
accumulation_steps: 4Slow Training
Enable EXLA:
# Should be automatic, but verify:
compiler: EXLAUse GPU if available:
export XLA_TARGET=cuda
Future Enhancements
- Transformers: BERT, RoBERTa via Bumblebee
- NER models: BiLSTM-CRF for named entity recognition
- Dependency parsing: Biaffine attention parser
- Multilingual: mBERT, XLM-R support
- Model quantization: INT8 for faster inference
- Knowledge distillation: Compress large models
See Also
- TRAINING_NEURAL.md - Detailed training guide
- PRETRAINED_MODELS.md - Using transformers
- API.md - Full API documentation
- BiLSTM-CRF paper
- Axon documentation