Nasty.Semantic.Coreference.Evaluator (Nasty v0.3.0)

Coreference resolution evaluation metrics.

Implements standard coreference evaluation metrics:

MUC (Vilain et al., 1995) - Mention-based
B³ (Bagga & Baldwin, 1998) - Entity-based
CEAF (Luo, 2005) - Entity-based with optimal alignment
CoNLL F1 - Average of MUC, B³, and CEAF

Example

# Evaluate predictions
metrics = Evaluator.evaluate(gold_chains, predicted_chains)

# Access individual metrics
muc_f1 = metrics.muc.f1
b3_f1 = metrics.b3.f1
ceaf_f1 = metrics.ceaf.f1
conll_f1 = metrics.conll_f1

References

MUC: Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
B³: Bagga & Baldwin (1998). "Algorithms for scoring coreference chains"
CEAF: Luo (2005). "On coreference resolution performance metrics"
CoNLL: Pradhan et al. (2012). "CoNLL-2012 shared task"

Summary

Types

evaluation()

metric()

Functions

compute_b3(gold_chains, predicted_chains)

Compute B³ metric (entity-based).

compute_ceaf(gold_chains, predicted_chains)

Compute CEAF metric (entity-based with optimal alignment).

compute_muc(gold_chains, predicted_chains)

Compute MUC metric (mention-based).

conll_f1(gold_chains, predicted_chains)

Compute CoNLL F1 score.

evaluate(gold_chains, predicted_chains)

Evaluate predicted coreference chains against gold standard.

format_results(metrics)

Format evaluation results as string.

Types

evaluation()

@type evaluation() :: %{
  muc: metric(),
  b3: metric(),
  ceaf: metric(),
  conll_f1: float()
}

metric()

@type metric() :: %{precision: float(), recall: float(), f1: float()}

Functions

compute_b3(gold_chains, predicted_chains)

@spec compute_b3([Nasty.AST.Semantic.CorefChain.t()], [
  Nasty.AST.Semantic.CorefChain.t()
]) :: metric()

Compute B³ metric (entity-based).

B³ computes precision and recall for each mention individually, then averages across all mentions.

Parameters

gold_chains - Gold standard chains
predicted_chains - Predicted chains

Returns

Map with precision, recall, and F1

compute_ceaf(gold_chains, predicted_chains)

@spec compute_ceaf([Nasty.AST.Semantic.CorefChain.t()], [
  Nasty.AST.Semantic.CorefChain.t()
]) :: metric()

Compute CEAF metric (entity-based with optimal alignment).

CEAF finds the optimal alignment between gold and predicted chains using the Kuhn-Munkres algorithm (Hungarian algorithm).

Parameters

gold_chains - Gold standard chains
predicted_chains - Predicted chains

Returns

Map with precision, recall, and F1

compute_muc(gold_chains, predicted_chains)

@spec compute_muc([Nasty.AST.Semantic.CorefChain.t()], [
  Nasty.AST.Semantic.CorefChain.t()
]) :: metric()

Compute MUC metric (mention-based).

MUC measures the minimum number of links needed to connect mentions in the same cluster.

Parameters

gold_chains - Gold standard chains
predicted_chains - Predicted chains

Returns

Map with precision, recall, and F1

conll_f1(gold_chains, predicted_chains)

@spec conll_f1([Nasty.AST.Semantic.CorefChain.t()], [
  Nasty.AST.Semantic.CorefChain.t()
]) :: float()

Compute CoNLL F1 score.

CoNLL F1 is the average of MUC, B³, and CEAF F1 scores.

Parameters

gold_chains - Gold standard chains
predicted_chains - Predicted chains

Returns

CoNLL F1 score (0.0 to 1.0)

evaluate(gold_chains, predicted_chains)

@spec evaluate([Nasty.AST.Semantic.CorefChain.t()], [
  Nasty.AST.Semantic.CorefChain.t()
]) :: evaluation()

Evaluate predicted coreference chains against gold standard.

Parameters

gold_chains - Gold standard coreference chains
predicted_chains - Predicted coreference chains

Returns

Map with all evaluation metrics

format_results(metrics)

@spec format_results(evaluation()) :: String.t()

Format evaluation results as string.

Parameters

metrics - Evaluation metrics

Returns

Formatted string with all metrics