Neural Coreference Resolution

Advanced neural coreference resolution using BiLSTM-CRF architecture.

Overview

This implementation provides neural coreference resolution that improves accuracy from ~70% F1 (rule-based) to 75-80% F1 (neural pair model).

Architecture

Phase 1: Neural Pair Model (Implemented)

Components:

Mention Encoder - BiLSTM with attention over context
Pair Scorer - Feedforward network with 20 hand-crafted features
Neural Resolver - Integration with existing mention detection
Evaluator - MUC, B³, CEAF metrics

Workflow:

Document → Mention Detection → Neural Encoding → Pairwise Scoring → Clustering → Coreference Chains

Quick Start

Training

mix nasty.train.coref \
  --corpus data/ontonotes/train \
  --dev data/ontonotes/dev \
  --output priv/models/en/coref \
  --epochs 20 \
  --batch-size 32

Evaluation

mix nasty.eval.coref \
  --model priv/models/en/coref \
  --test data/ontonotes/test

Using in Code

alias Nasty.Semantic.Coreference.Neural.{Resolver, Trainer}

# Load models
{:ok, models, params, vocab} = Trainer.load_models("priv/models/en/coref")

# Resolve coreferences
{:ok, document} = Resolver.resolve(document, models, params, vocab)

# Access chains
document.coref_chains
|> Enum.each(fn chain ->
  IO.puts("Chain #{chain.id}: #{chain.representative}")
  IO.puts("  Mentions: #{length(chain.mentions)}")
end)

Data Format

OntoNotes CoNLL-2012

The system expects CoNLL-2012 format with coreference annotations:

doc1  0  0  John   NNP  ...  (0
doc1  0  1  works  VBZ  ...  -
doc1  0  2  at     IN   ...  -
doc1  0  3  Google NNP  ...  (1)
...
doc1  0  10 He     PRP  ...  0)

Modules

Model Architecture Details

Mention Encoder

Input: Token IDs + mention mask
Embedding: 100d (GloVe compatible)
BiLSTM: 128 hidden units
Attention: Over mention span
Output: 256d mention representation

Pair Scorer

Input: [m1_encoding (256d), m2_encoding (256d), features (20d)]
Hidden layers: [512, 256] with ReLU + dropout
Output: Sigmoid probability

Features (20 total)

1-3. Distance features (sentence, token, mention) 4-6. String match (exact, partial, head) 7-12. Mention types (pronoun, name, definite NP for each) 13-15. Agreement (gender, number, entity type) 16-20. Positional (same sentence, first mentions, pronoun-name pair)

Training

Hyperparameters

Epochs: 20 (with early stopping)
Batch size: 32
Learning rate: 0.001 (Adam)
Dropout: 0.3
Patience: 3 epochs
Max distance: 3 sentences

Data Preparation

Positive pairs: Mentions in same chain
Negative pairs: Mentions in different chains
Ratio: 1:1 (configurable)
Shuffling: Enabled

Evaluation Metrics

MUC (Mention-based)

Measures minimum links needed to connect mentions.

B³ (Entity-based)

Averages precision/recall per mention.

CEAF (Entity alignment)

Optimal alignment between gold and predicted chains.

CoNLL F1

Average of MUC, B³, and CEAF F1 scores.

Performance

Expected Results

Rule-based baseline: ~70% CoNLL F1
Neural pair model: 75-80% CoNLL F1
Improvement: +5-10 F1 points

Speed

Encoding: ~100 mentions/sec
Scoring: ~1000 pairs/sec
End-to-end: ~50-100ms per document

Future Enhancements

Phase 2: Span-Based End-to-End (Planned)

Joint mention detection + coreference
Span enumeration with pruning
End-to-end optimization
Target: 82-85% CoNLL F1

Phase 3: Transformer Fine-tuning (Planned)

SpanBERT or Longformer
Pre-trained contextual embeddings
Target: 88-90% CoNLL F1

Troubleshooting

Out of Memory

Reduce batch size: --batch-size 16
Use smaller hidden dim: --hidden-dim 64
Process fewer documents at once

Low Accuracy

Check data format (CoNLL-2012)
Increase training epochs: --epochs 30
Add more training data
Tune hyperparameters

Slow Training

Use GPU acceleration (EXLA)
Increase batch size: --batch-size 64
Reduce max distance: --max-distance 2

References

Lee et al. (2017). "End-to-end Neural Coreference Resolution"
Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
Pradhan et al. (2012). "CoNLL-2012 shared task"

Neural Coreference Resolution

Overview

Architecture

Phase 1: Neural Pair Model (Implemented)

Quick Start

Training

Evaluation

Using in Code

Data Format

OntoNotes CoNLL-2012

Modules

Core Neural Components

Evaluation

Mix Tasks

Model Architecture Details

Mention Encoder

Pair Scorer

Features (20 total)

Training

Hyperparameters

Data Preparation

Evaluation Metrics

MUC (Mention-based)

B³ (Entity-based)

CEAF (Entity alignment)

CoNLL F1

Performance

Expected Results

Speed

Future Enhancements

Phase 2: Span-Based End-to-End (Planned)

Phase 3: Transformer Fine-tuning (Planned)

Troubleshooting

Out of Memory

Low Accuracy

Slow Training

References

See Also