Neural Models in Nasty

Complete guide to using neural network models in Nasty for state-of-the-art NLP performance.

Overview

Nasty integrates neural network models using Axon, Elixir's neural network library, providing:

BiLSTM-CRF architecture for sequence tagging (POS, NER)
97-98% accuracy on standard POS tagging benchmarks
EXLA JIT compilation for 10-100x speedup
Seamless integration with existing pipeline
Pre-trained embedding support (GloVe, FastText)
Model persistence and loading
Graceful fallbacks to HMM and rule-based models

Quick Start

Installation

Neural dependencies are already included in mix.exs:

# Already added
{:axon, "~> 0.7"},      # Neural networks
{:nx, "~> 0.9"},        # Numerical computing
{:exla, "~> 0.9"},      # XLA compiler (GPU/CPU acceleration)
{:bumblebee, "~> 0.6"}, # Pre-trained models
{:tokenizers, "~> 0.5"} # Fast tokenization

Basic Usage

# Parse text with neural POS tagger
{:ok, ast} = Nasty.parse("The cat sat on the mat.", 
  language: :en,
  model: :neural
)

# Tokens will have POS tags predicted by neural model

Training Your Own Model

# Download Universal Dependencies corpus
# https://universaldependencies.org/

# Train neural POS tagger
mix nasty.train.neural_pos \
  --corpus data/en_ewt-ud-train.conllu \
  --test-corpus data/en_ewt-ud-test.conllu \
  --epochs 10 \
  --hidden-size 256

# Model saved to priv/models/en/pos_neural_v1.axon

Using Trained Models

alias Nasty.Statistics.POSTagging.NeuralTagger

# Load model
{:ok, model} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon")

# Predict
words = ["The", "cat", "sat"]
{:ok, tags} = NeuralTagger.predict(model, words, [])
# => {:ok, [:det, :noun, :verb]}

Architecture

BiLSTM-CRF

The default architecture is Bidirectional LSTM with CRF (Conditional Random Field):

flowchart TD
    A[Input Words]
    B["Word Embeddings (300d)"]
    C["BiLSTM Layer 1 (256 hidden units)"]
    D["Dropout (0.3)"]
    E["BiLSTM Layer 2 (256 hidden units)"]
    F["Dense Projection → POS Tags"]
    G[Softmax/CRF]
    H[Output Tags]
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H

Key Features:

Bidirectional context (forward + backward)
Optional character-level CNN for OOV handling
Dropout regularization
2-3 LSTM layers (configurable)
256-512 hidden units (configurable)

Performance

Accuracy:

POS Tagging: 97-98% (vs 95% HMM, 85% rule-based)
NER: 88-92% F1 (future)
Dependency Parsing: 94-96% UAS (future)

Speed (on UD English, 12k sentences):

CPU: ~30-60 minutes training
GPU (EXLA): ~5-10 minutes training
Inference: ~1000-5000 tokens/second (CPU)
Inference: ~10000+ tokens/second (GPU)

Model Integration Modes

Nasty provides multiple integration modes:

1. Neural Only (`:neural`)

Uses only the neural model:

{:ok, ast} = Nasty.parse(text, language: :en, model: :neural)

Fallback: If neural model unavailable, falls back to HMM → rule-based.

2. Neural Ensemble (`:neural_ensemble`)

Combines neural + HMM + rule-based:

{:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble)

Strategy:

Use rule-based for punctuation and numbers (high confidence)
Use neural predictions for content words
Best accuracy overall

3. Traditional Modes

Still available:

:rule_based - Fast, 85% accuracy
:hmm - 95% accuracy
:ensemble - HMM + rules

Training Guide

1. Prepare Data

Download Universal Dependencies corpus:

# English
wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu

# Or other languages
# Spanish, Catalan, etc.

2. Train Model

mix nasty.train.neural_pos \
  --corpus en_ewt-ud-train.conllu \
  --test-corpus en_ewt-ud-test.conllu \
  --output priv/models/en/pos_neural_v1.axon \
  --epochs 10 \
  --batch-size 32 \
  --learning-rate 0.001 \
  --hidden-size 256 \
  --num-layers 2 \
  --dropout 0.3 \
  --use-char-cnn false

3. Evaluate

The training task automatically evaluates on test set and reports:

Overall accuracy
Per-tag precision, recall, F1
Confusion matrix (if requested)

4. Deploy

Models are automatically saved with:

Model weights (.axon file)
Metadata (.meta.json file)
Vocabulary and tag mappings

Load via ModelLoader.load_latest(:en, :pos_tagging_neural) or directly with NeuralTagger.load/1.

Programmatic Training

alias Nasty.Statistics.POSTagging.NeuralTagger
alias Nasty.Statistics.Neural.DataLoader

# Load corpus
{:ok, sentences} = DataLoader.load_conllu("train.conllu")

# Split data
{train, valid} = DataLoader.split(sentences, [0.9, 0.1])

# Build vocabularies
{:ok, vocab, tag_vocab} = DataLoader.build_vocabularies(train, min_freq: 2)

# Create model
tagger = NeuralTagger.new(
  vocab: vocab,
  tag_vocab: tag_vocab,
  embedding_dim: 300,
  hidden_size: 256,
  num_layers: 2,
  dropout: 0.3
)

# Train
{:ok, trained} = NeuralTagger.train(tagger, train,
  epochs: 10,
  batch_size: 32,
  learning_rate: 0.001,
  validation_split: 0.1
)

# Save
NeuralTagger.save(trained, "my_model.axon")

Pre-trained Embeddings

Using GloVe

alias Nasty.Statistics.Neural.Embeddings

# Load GloVe embeddings
{:ok, embeddings} = Embeddings.load_glove("glove.6B.300d.txt", vocab)

# Use during training
tagger = NeuralTagger.new(
  vocab: vocab,
  tag_vocab: tag_vocab,
  pretrained_embeddings: embeddings
)

Download GloVe:

wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip

Advanced Features

Character-Level CNN

For better OOV handling:

mix nasty.train.neural_pos \
  --corpus train.conllu \
  --use-char-cnn \
  --char-filters 3,4,5 \
  --char-num-filters 30

Custom Architectures

Extend Nasty.Statistics.Neural.Architectures.BiLSTMCRF:

defmodule MyArchitecture do
  def build(opts) do
    # Custom Axon model
    Axon.input("tokens")
    |> Axon.embedding(opts[:vocab_size], opts[:embedding_dim])
    |> # ... your architecture
  end
end

Streaming Training

For large datasets:

DataLoader.stream_batches("huge_corpus.conllu", vocab, tag_vocab, batch_size: 64)
|> Stream.take(1000)  # Process in chunks
|> Enum.each(&train_batch/1)

Troubleshooting

EXLA Compilation Issues

If EXLA fails to compile:

# Install XLA dependencies
# Ubuntu/Debian:
sudo apt-get install build-essential

# Set compiler flags
export ELIXIR_ERL_OPTIONS="+fnu"
mix deps.clean exla --build
mix deps.get

Out of Memory

Reduce batch size:

mix nasty.train.neural_pos --batch-size 16  # Instead of 32

Or use gradient accumulation:

# In training opts
accumulation_steps: 4

Slow Training

Enable EXLA:

# Should be automatic, but verify:
compiler: EXLA

Use GPU if available:

export XLA_TARGET=cuda

Future Enhancements

Transformers: BERT, RoBERTa via Bumblebee
NER models: BiLSTM-CRF for named entity recognition
Dependency parsing: Biaffine attention parser
Multilingual: mBERT, XLM-R support
Model quantization: INT8 for faster inference
Knowledge distillation: Compress large models