Neural Coreference Resolution

View Source

Advanced neural coreference resolution using BiLSTM-CRF architecture.

Overview

This implementation provides neural coreference resolution that improves accuracy from ~70% F1 (rule-based) to 75-80% F1 (neural pair model).

Architecture

Phase 1: Neural Pair Model (Implemented)

Components:

  1. Mention Encoder - BiLSTM with attention over context
  2. Pair Scorer - Feedforward network with 20 hand-crafted features
  3. Neural Resolver - Integration with existing mention detection
  4. Evaluator - MUC, B³, CEAF metrics

Workflow:

Document  Mention Detection  Neural Encoding  Pairwise Scoring  Clustering  Coreference Chains

Quick Start

Training

mix nasty.train.coref \
  --corpus data/ontonotes/train \
  --dev data/ontonotes/dev \
  --output priv/models/en/coref \
  --epochs 20 \
  --batch-size 32

Evaluation

mix nasty.eval.coref \
  --model priv/models/en/coref \
  --test data/ontonotes/test

Using in Code

alias Nasty.Semantic.Coreference.Neural.{Resolver, Trainer}

# Load models
{:ok, models, params, vocab} = Trainer.load_models("priv/models/en/coref")

# Resolve coreferences
{:ok, document} = Resolver.resolve(document, models, params, vocab)

# Access chains
document.coref_chains
|> Enum.each(fn chain ->
  IO.puts("Chain #{chain.id}: #{chain.representative}")
  IO.puts("  Mentions: #{length(chain.mentions)}")
end)

Data Format

OntoNotes CoNLL-2012

The system expects CoNLL-2012 format with coreference annotations:

doc1  0  0  John   NNP  ...  (0
doc1  0  1  works  VBZ  ...  -
doc1  0  2  at     IN   ...  -
doc1  0  3  Google NNP  ...  (1)
...
doc1  0  10 He     PRP  ...  0)

Modules

Core Neural Components

Evaluation

Mix Tasks

Model Architecture Details

Mention Encoder

  • Input: Token IDs + mention mask
  • Embedding: 100d (GloVe compatible)
  • BiLSTM: 128 hidden units
  • Attention: Over mention span
  • Output: 256d mention representation

Pair Scorer

  • Input: [m1_encoding (256d), m2_encoding (256d), features (20d)]
  • Hidden layers: [512, 256] with ReLU + dropout
  • Output: Sigmoid probability

Features (20 total)

1-3. Distance features (sentence, token, mention) 4-6. String match (exact, partial, head) 7-12. Mention types (pronoun, name, definite NP for each) 13-15. Agreement (gender, number, entity type) 16-20. Positional (same sentence, first mentions, pronoun-name pair)

Training

Hyperparameters

  • Epochs: 20 (with early stopping)
  • Batch size: 32
  • Learning rate: 0.001 (Adam)
  • Dropout: 0.3
  • Patience: 3 epochs
  • Max distance: 3 sentences

Data Preparation

  • Positive pairs: Mentions in same chain
  • Negative pairs: Mentions in different chains
  • Ratio: 1:1 (configurable)
  • Shuffling: Enabled

Evaluation Metrics

MUC (Mention-based)

Measures minimum links needed to connect mentions.

B³ (Entity-based)

Averages precision/recall per mention.

CEAF (Entity alignment)

Optimal alignment between gold and predicted chains.

CoNLL F1

Average of MUC, B³, and CEAF F1 scores.

Performance

Expected Results

  • Rule-based baseline: ~70% CoNLL F1
  • Neural pair model: 75-80% CoNLL F1
  • Improvement: +5-10 F1 points

Speed

  • Encoding: ~100 mentions/sec
  • Scoring: ~1000 pairs/sec
  • End-to-end: ~50-100ms per document

Future Enhancements

Phase 2: Span-Based End-to-End (Planned)

  • Joint mention detection + coreference
  • Span enumeration with pruning
  • End-to-end optimization
  • Target: 82-85% CoNLL F1

Phase 3: Transformer Fine-tuning (Planned)

  • SpanBERT or Longformer
  • Pre-trained contextual embeddings
  • Target: 88-90% CoNLL F1

Troubleshooting

Out of Memory

  • Reduce batch size: --batch-size 16
  • Use smaller hidden dim: --hidden-dim 64
  • Process fewer documents at once

Low Accuracy

  • Check data format (CoNLL-2012)
  • Increase training epochs: --epochs 30
  • Add more training data
  • Tune hyperparameters

Slow Training

  • Use GPU acceleration (EXLA)
  • Increase batch size: --batch-size 64
  • Reduce max distance: --max-distance 2

References

  • Lee et al. (2017). "End-to-end Neural Coreference Resolution"
  • Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
  • Pradhan et al. (2012). "CoNLL-2012 shared task"

See Also