Nasty.Semantic.Coreference.Evaluator (Nasty v0.3.0)
View SourceCoreference resolution evaluation metrics.
Implements standard coreference evaluation metrics:
- MUC (Vilain et al., 1995) - Mention-based
- B³ (Bagga & Baldwin, 1998) - Entity-based
- CEAF (Luo, 2005) - Entity-based with optimal alignment
- CoNLL F1 - Average of MUC, B³, and CEAF
Example
# Evaluate predictions
metrics = Evaluator.evaluate(gold_chains, predicted_chains)
# Access individual metrics
muc_f1 = metrics.muc.f1
b3_f1 = metrics.b3.f1
ceaf_f1 = metrics.ceaf.f1
conll_f1 = metrics.conll_f1References
- MUC: Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
- B³: Bagga & Baldwin (1998). "Algorithms for scoring coreference chains"
- CEAF: Luo (2005). "On coreference resolution performance metrics"
- CoNLL: Pradhan et al. (2012). "CoNLL-2012 shared task"
Summary
Functions
Compute B³ metric (entity-based).
Compute CEAF metric (entity-based with optimal alignment).
Compute MUC metric (mention-based).
Compute CoNLL F1 score.
Evaluate predicted coreference chains against gold standard.
Format evaluation results as string.
Types
Functions
@spec compute_b3([Nasty.AST.Semantic.CorefChain.t()], [ Nasty.AST.Semantic.CorefChain.t() ]) :: metric()
Compute B³ metric (entity-based).
B³ computes precision and recall for each mention individually, then averages across all mentions.
Parameters
gold_chains- Gold standard chainspredicted_chains- Predicted chains
Returns
Map with precision, recall, and F1
@spec compute_ceaf([Nasty.AST.Semantic.CorefChain.t()], [ Nasty.AST.Semantic.CorefChain.t() ]) :: metric()
Compute CEAF metric (entity-based with optimal alignment).
CEAF finds the optimal alignment between gold and predicted chains using the Kuhn-Munkres algorithm (Hungarian algorithm).
Parameters
gold_chains- Gold standard chainspredicted_chains- Predicted chains
Returns
Map with precision, recall, and F1
@spec compute_muc([Nasty.AST.Semantic.CorefChain.t()], [ Nasty.AST.Semantic.CorefChain.t() ]) :: metric()
Compute MUC metric (mention-based).
MUC measures the minimum number of links needed to connect mentions in the same cluster.
Parameters
gold_chains- Gold standard chainspredicted_chains- Predicted chains
Returns
Map with precision, recall, and F1
@spec conll_f1([Nasty.AST.Semantic.CorefChain.t()], [ Nasty.AST.Semantic.CorefChain.t() ]) :: float()
Compute CoNLL F1 score.
CoNLL F1 is the average of MUC, B³, and CEAF F1 scores.
Parameters
gold_chains- Gold standard chainspredicted_chains- Predicted chains
Returns
CoNLL F1 score (0.0 to 1.0)
@spec evaluate([Nasty.AST.Semantic.CorefChain.t()], [ Nasty.AST.Semantic.CorefChain.t() ]) :: evaluation()
Evaluate predicted coreference chains against gold standard.
Parameters
gold_chains- Gold standard coreference chainspredicted_chains- Predicted coreference chains
Returns
Map with all evaluation metrics
@spec format_results(evaluation()) :: String.t()
Format evaluation results as string.
Parameters
metrics- Evaluation metrics
Returns
Formatted string with all metrics