Nasty.Statistics.Evaluator (Nasty v0.3.0)

View Source

Model evaluation and performance metrics.

Provides standard NLP evaluation metrics for various tasks:

  • Classification: Accuracy, precision, recall, F1
  • Sequence tagging: Token-level and entity-level metrics
  • Parsing: PARSEVAL metrics

Examples

# POS tagging evaluation
gold = [:noun, :verb, :det, :noun]
pred = [:noun, :verb, :adj, :noun]
metrics = Evaluator.classification_metrics(gold, pred)
# => %{accuracy: 0.75, ...}

# Confusion matrix
matrix = Evaluator.confusion_matrix(gold, pred)

Summary

Functions

Calculate accuracy: correct predictions / total predictions.

Calculate classification metrics (accuracy, precision, recall, F1).

Entity-level evaluation for NER.

Calculate per-class precision, recall, and F1.

Print a formatted confusion matrix.

Print a formatted classification report.

Functions

accuracy(gold, predicted)

@spec accuracy([atom()], [atom()]) :: float()

Calculate accuracy: correct predictions / total predictions.

Examples

iex> gold = [:a, :b, :c, :a]
iex> pred = [:a, :b, :b, :a]
iex> Evaluator.accuracy(gold, pred)
0.75

classification_metrics(gold, predicted, opts \\ [])

@spec classification_metrics([atom()], [atom()], keyword()) :: map()

Calculate classification metrics (accuracy, precision, recall, F1).

Parameters

  • gold - List of gold-standard labels
  • predicted - List of predicted labels
  • opts - Options
    • :average - Averaging method: :micro, :macro, :weighted (default: :macro)
    • :labels - Specific labels to include (default: all)

Returns

  • Map with metrics:
    • :accuracy - Overall accuracy
    • :precision - Precision score
    • :recall - Recall score
    • :f1 - F1 score
    • :support - Number of true instances per class

confusion_matrix(gold, predicted, labels \\ nil)

@spec confusion_matrix([atom()], [atom()], [atom()] | nil) :: map()

Build a confusion matrix.

Parameters

  • gold - Gold-standard labels
  • predicted - Predicted labels
  • labels - Optional list of labels to include (default: all unique labels)

Returns

  • Map of maps: %{true_label => %{pred_label => count}}

Examples

iex> gold = [:a, :b, :b, :a]
iex> pred = [:a, :a, :b, :a]
iex> confusion_matrix(gold, pred)
%{a: %{a: 2, b: 0}, b: %{a: 1, b: 1}}

entity_metrics(gold_entities, pred_entities)

@spec entity_metrics([tuple()], [tuple()]) :: map()

Entity-level evaluation for NER.

Compares predicted and gold entity spans using strict matching.

Parameters

  • gold_entities - List of gold entities: [{type, start, end}, ...]
  • pred_entities - List of predicted entities: [{type, start, end}, ...]

Returns

  • Map with :precision, :recall, :f1

per_class_metrics(gold, predicted, label)

@spec per_class_metrics([atom()], [atom()], atom()) :: map()

Calculate per-class precision, recall, and F1.

Parameters

  • gold - Gold-standard labels
  • predicted - Predicted labels
  • label - The label/class to evaluate

Returns

  • Map with :precision, :recall, :f1, :support