Neural Coreference Resolution
View SourceAdvanced neural coreference resolution using BiLSTM-CRF architecture.
Overview
This implementation provides neural coreference resolution that improves accuracy from ~70% F1 (rule-based) to 75-80% F1 (neural pair model).
Architecture
Phase 1: Neural Pair Model (Implemented)
Components:
- Mention Encoder - BiLSTM with attention over context
- Pair Scorer - Feedforward network with 20 hand-crafted features
- Neural Resolver - Integration with existing mention detection
- Evaluator - MUC, B³, CEAF metrics
Workflow:
Document → Mention Detection → Neural Encoding → Pairwise Scoring → Clustering → Coreference ChainsQuick Start
Training
mix nasty.train.coref \
--corpus data/ontonotes/train \
--dev data/ontonotes/dev \
--output priv/models/en/coref \
--epochs 20 \
--batch-size 32
Evaluation
mix nasty.eval.coref \
--model priv/models/en/coref \
--test data/ontonotes/test
Using in Code
alias Nasty.Semantic.Coreference.Neural.{Resolver, Trainer}
# Load models
{:ok, models, params, vocab} = Trainer.load_models("priv/models/en/coref")
# Resolve coreferences
{:ok, document} = Resolver.resolve(document, models, params, vocab)
# Access chains
document.coref_chains
|> Enum.each(fn chain ->
IO.puts("Chain #{chain.id}: #{chain.representative}")
IO.puts(" Mentions: #{length(chain.mentions)}")
end)Data Format
OntoNotes CoNLL-2012
The system expects CoNLL-2012 format with coreference annotations:
doc1 0 0 John NNP ... (0
doc1 0 1 works VBZ ... -
doc1 0 2 at IN ... -
doc1 0 3 Google NNP ... (1)
...
doc1 0 10 He PRP ... 0)Modules
Core Neural Components
Nasty.Data.OntoNotes- CoNLL-2012 data loaderNasty.Semantic.Coreference.Neural.MentionEncoder- BiLSTM mention encoderNasty.Semantic.Coreference.Neural.PairScorer- Neural pair scoringNasty.Semantic.Coreference.Neural.Trainer- Training pipelineNasty.Semantic.Coreference.Neural.Resolver- Integration layer
Evaluation
Nasty.Semantic.Coreference.Evaluator- Standard coreference metrics
Mix Tasks
mix nasty.train.coref- Train modelsmix nasty.eval.coref- Evaluate models
Model Architecture Details
Mention Encoder
- Input: Token IDs + mention mask
- Embedding: 100d (GloVe compatible)
- BiLSTM: 128 hidden units
- Attention: Over mention span
- Output: 256d mention representation
Pair Scorer
- Input: [m1_encoding (256d), m2_encoding (256d), features (20d)]
- Hidden layers: [512, 256] with ReLU + dropout
- Output: Sigmoid probability
Features (20 total)
1-3. Distance features (sentence, token, mention) 4-6. String match (exact, partial, head) 7-12. Mention types (pronoun, name, definite NP for each) 13-15. Agreement (gender, number, entity type) 16-20. Positional (same sentence, first mentions, pronoun-name pair)
Training
Hyperparameters
- Epochs: 20 (with early stopping)
- Batch size: 32
- Learning rate: 0.001 (Adam)
- Dropout: 0.3
- Patience: 3 epochs
- Max distance: 3 sentences
Data Preparation
- Positive pairs: Mentions in same chain
- Negative pairs: Mentions in different chains
- Ratio: 1:1 (configurable)
- Shuffling: Enabled
Evaluation Metrics
MUC (Mention-based)
Measures minimum links needed to connect mentions.
B³ (Entity-based)
Averages precision/recall per mention.
CEAF (Entity alignment)
Optimal alignment between gold and predicted chains.
CoNLL F1
Average of MUC, B³, and CEAF F1 scores.
Performance
Expected Results
- Rule-based baseline: ~70% CoNLL F1
- Neural pair model: 75-80% CoNLL F1
- Improvement: +5-10 F1 points
Speed
- Encoding: ~100 mentions/sec
- Scoring: ~1000 pairs/sec
- End-to-end: ~50-100ms per document
Future Enhancements
Phase 2: Span-Based End-to-End (Planned)
- Joint mention detection + coreference
- Span enumeration with pruning
- End-to-end optimization
- Target: 82-85% CoNLL F1
Phase 3: Transformer Fine-tuning (Planned)
- SpanBERT or Longformer
- Pre-trained contextual embeddings
- Target: 88-90% CoNLL F1
Troubleshooting
Out of Memory
- Reduce batch size:
--batch-size 16 - Use smaller hidden dim:
--hidden-dim 64 - Process fewer documents at once
Low Accuracy
- Check data format (CoNLL-2012)
- Increase training epochs:
--epochs 30 - Add more training data
- Tune hyperparameters
Slow Training
- Use GPU acceleration (EXLA)
- Increase batch size:
--batch-size 64 - Reduce max distance:
--max-distance 2
References
- Lee et al. (2017). "End-to-end Neural Coreference Resolution"
- Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
- Pradhan et al. (2012). "CoNLL-2012 shared task"
See Also
- COREFERENCE_TRAINING.md - Detailed training guide
- Plan - Complete implementation roadmap
- API.md - Full API reference