ExFairness.Metrics.Calibration (ExFairness v0.5.1)
View SourceCalibration fairness metric.
Measures whether predicted probabilities are well-calibrated across groups. A model is calibrated if predictions of p% actually occur p% of the time.
Mathematical Definition
For predicted probability ŝ(x) and outcome y:
P(Y = 1 | ŝ(X) = s, A = a) ≈ s for all s, aFairness requires calibration holds across all groups.
Expected Calibration Error (ECE)
ECE measures the weighted average of calibration error across bins:
ECE = Σ_b (n_b / n) · |acc(b) - conf(b)|where:
- b = bin index
- n_b = number of samples in bin b
- acc(b) = accuracy in bin b
- conf(b) = average confidence in bin b
Group Fairness
Calibration fairness requires similar ECE across groups:
Δ_ECE = |ECE_A - ECE_B|Use Cases
- Medical risk scores (predicted risk should match actual risk)
- Credit scoring (approval probability should match default rate)
- Hiring (interview likelihood should match success rate)
- Any application where users rely on prediction confidence
References
- Kleinberg, J., et al. (2017). "Inherent trade-offs in algorithmic fairness."
- Pleiss, G., et al. (2017). "On fairness and calibration." NeurIPS.
- Chouldechova, A. (2017). "Fair prediction with disparate impact."
- Guo, C., et al. (2017). "On calibration of modern neural networks." ICML.
Examples
iex> # Perfect calibration example
iex> probs = Nx.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
iex> labels = Nx.tensor([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Metrics.Calibration.compute(probs, labels, sensitive, n_bins: 5)
iex> is_float(result.disparity)
true
Summary
Functions
Computes calibration fairness disparity between groups.
Generates reliability diagram data for calibration plotting.
Types
Functions
@spec compute(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: result()
Computes calibration fairness disparity between groups.
Parameters
probabilities- Predicted probabilities (0.0 to 1.0)labels- Binary labels (0 or 1)sensitive_attr- Binary sensitive attribute (0 or 1)opts::n_bins- Number of probability bins (default: 10):strategy- Binning strategy (:uniform or :quantile, default: :uniform):threshold- Max acceptable ECE disparity (default: 0.1):min_per_group- Minimum samples per group (default: 5)
Returns
Map with ECE for each group, disparity, and detailed calibration metrics:
:group_a_ece- Expected Calibration Error for group A:group_b_ece- Expected Calibration Error for group B:disparity- Absolute difference in ECE:passes- Whether disparity is within threshold:threshold- Threshold used:group_a_mce- Maximum Calibration Error for group A:group_b_mce- Maximum Calibration Error for group B:n_bins- Number of bins used:strategy- Binning strategy used:interpretation- Plain language explanation
Examples
iex> probs = Nx.tensor([0.1, 0.3, 0.6, 0.9, 0.2, 0.4, 0.7, 0.8, 0.5, 0.3, 0.1, 0.3, 0.6, 0.9, 0.2, 0.4, 0.7, 0.8, 0.5, 0.3])
iex> labels = Nx.tensor([0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Metrics.Calibration.compute(probs, labels, sensitive, n_bins: 5)
iex> result.n_bins
5
@spec reliability_diagram(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: %{ bins: [map()], n_bins: integer(), strategy: :uniform | :quantile }
Generates reliability diagram data for calibration plotting.
Returns bin-level accuracy, confidence, and counts per group using the same
binning strategy as compute/4.