Nasty.Statistics.Evaluator (Nasty v0.3.0)

Model evaluation and performance metrics.

Provides standard NLP evaluation metrics for various tasks:

Classification: Accuracy, precision, recall, F1
Sequence tagging: Token-level and entity-level metrics
Parsing: PARSEVAL metrics

Examples

# POS tagging evaluation
gold = [:noun, :verb, :det, :noun]
pred = [:noun, :verb, :adj, :noun]
metrics = Evaluator.classification_metrics(gold, pred)
# => %{accuracy: 0.75, ...}

# Confusion matrix
matrix = Evaluator.confusion_matrix(gold, pred)

Summary

Functions

accuracy(gold, predicted)

Calculate accuracy: correct predictions / total predictions.

classification_metrics(gold, predicted, opts \\ [])

Calculate classification metrics (accuracy, precision, recall, F1).

confusion_matrix(gold, predicted, labels \\ nil)

Build a confusion matrix.

entity_metrics(gold_entities, pred_entities)

Entity-level evaluation for NER.

per_class_metrics(gold, predicted, label)

Calculate per-class precision, recall, and F1.

print_confusion_matrix(matrix)

Print a formatted confusion matrix.

print_report(metrics)

Print a formatted classification report.

Functions

accuracy(gold, predicted)

@spec accuracy([atom()], [atom()]) :: float()

Calculate accuracy: correct predictions / total predictions.

Examples

iex> gold = [:a, :b, :c, :a]
iex> pred = [:a, :b, :b, :a]
iex> Evaluator.accuracy(gold, pred)
0.75

classification_metrics(gold, predicted, opts \\ [])

@spec classification_metrics([atom()], [atom()], keyword()) :: map()

Calculate classification metrics (accuracy, precision, recall, F1).

Parameters

gold - List of gold-standard labels
predicted - List of predicted labels
opts - Options
- :average - Averaging method: :micro, :macro, :weighted (default: :macro)
- :labels - Specific labels to include (default: all)

Returns

Map with metrics:
- :accuracy - Overall accuracy
- :precision - Precision score
- :recall - Recall score
- :f1 - F1 score
- :support - Number of true instances per class

confusion_matrix(gold, predicted, labels \\ nil)

@spec confusion_matrix([atom()], [atom()], [atom()] | nil) :: map()

Build a confusion matrix.

Parameters

gold - Gold-standard labels
predicted - Predicted labels
labels - Optional list of labels to include (default: all unique labels)

Returns

Map of maps: %{true_label => %{pred_label => count}}

Examples

iex> gold = [:a, :b, :b, :a]
iex> pred = [:a, :a, :b, :a]
iex> confusion_matrix(gold, pred)
%{a: %{a: 2, b: 0}, b: %{a: 1, b: 1}}

entity_metrics(gold_entities, pred_entities)

@spec entity_metrics([tuple()], [tuple()]) :: map()

Entity-level evaluation for NER.

Compares predicted and gold entity spans using strict matching.

Parameters

gold_entities - List of gold entities: [{type, start, end}, ...]
pred_entities - List of predicted entities: [{type, start, end}, ...]

Returns

Map with :precision, :recall, :f1

per_class_metrics(gold, predicted, label)

@spec per_class_metrics([atom()], [atom()], atom()) :: map()

Calculate per-class precision, recall, and F1.

Parameters

gold - Gold-standard labels
predicted - Predicted labels
label - The label/class to evaluate

Returns

Map with :precision, :recall, :f1, :support

print_confusion_matrix(matrix)

@spec print_confusion_matrix(map()) :: :ok

Print a formatted confusion matrix.

Examples

iex> matrix = confusion_matrix(gold, pred)
iex> print_confusion_matrix(matrix)
# Prints a nicely formatted table

print_report(metrics)

@spec print_report(map()) :: :ok

Print a formatted classification report.

Examples

iex> metrics = classification_metrics(gold, pred)
iex> print_report(metrics)
# Prints precision, recall, F1 for each class