ExFairness.Metrics.Calibration (ExFairness v0.5.1)

View Source

Calibration fairness metric.

Measures whether predicted probabilities are well-calibrated across groups. A model is calibrated if predictions of p% actually occur p% of the time.

Mathematical Definition

For predicted probability ŝ(x) and outcome y:

P(Y = 1 | ŝ(X) = s, A = a)  s  for all s, a

Fairness requires calibration holds across all groups.

Expected Calibration Error (ECE)

ECE measures the weighted average of calibration error across bins:

ECE = Σ_b (n_b / n) · |acc(b) - conf(b)|

where:

  • b = bin index
  • n_b = number of samples in bin b
  • acc(b) = accuracy in bin b
  • conf(b) = average confidence in bin b

Group Fairness

Calibration fairness requires similar ECE across groups:

Δ_ECE = |ECE_A - ECE_B|

Use Cases

  • Medical risk scores (predicted risk should match actual risk)
  • Credit scoring (approval probability should match default rate)
  • Hiring (interview likelihood should match success rate)
  • Any application where users rely on prediction confidence

References

  • Kleinberg, J., et al. (2017). "Inherent trade-offs in algorithmic fairness."
  • Pleiss, G., et al. (2017). "On fairness and calibration." NeurIPS.
  • Chouldechova, A. (2017). "Fair prediction with disparate impact."
  • Guo, C., et al. (2017). "On calibration of modern neural networks." ICML.

Examples

iex> # Perfect calibration example
iex> probs = Nx.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
iex> labels = Nx.tensor([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Metrics.Calibration.compute(probs, labels, sensitive, n_bins: 5)
iex> is_float(result.disparity)
true

Summary

Functions

Computes calibration fairness disparity between groups.

Generates reliability diagram data for calibration plotting.

Types

result()

@type result() :: %{
  group_a_ece: float(),
  group_b_ece: float(),
  disparity: float(),
  passes: boolean(),
  threshold: float(),
  group_a_mce: float(),
  group_b_mce: float(),
  n_bins: integer(),
  strategy: :uniform | :quantile,
  interpretation: String.t()
}

Functions

compute(probabilities, labels, sensitive_attr, opts \\ [])

@spec compute(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: result()

Computes calibration fairness disparity between groups.

Parameters

  • probabilities - Predicted probabilities (0.0 to 1.0)
  • labels - Binary labels (0 or 1)
  • sensitive_attr - Binary sensitive attribute (0 or 1)
  • opts:
    • :n_bins - Number of probability bins (default: 10)
    • :strategy - Binning strategy (:uniform or :quantile, default: :uniform)
    • :threshold - Max acceptable ECE disparity (default: 0.1)
    • :min_per_group - Minimum samples per group (default: 5)

Returns

Map with ECE for each group, disparity, and detailed calibration metrics:

  • :group_a_ece - Expected Calibration Error for group A
  • :group_b_ece - Expected Calibration Error for group B
  • :disparity - Absolute difference in ECE
  • :passes - Whether disparity is within threshold
  • :threshold - Threshold used
  • :group_a_mce - Maximum Calibration Error for group A
  • :group_b_mce - Maximum Calibration Error for group B
  • :n_bins - Number of bins used
  • :strategy - Binning strategy used
  • :interpretation - Plain language explanation

Examples

iex> probs = Nx.tensor([0.1, 0.3, 0.6, 0.9, 0.2, 0.4, 0.7, 0.8, 0.5, 0.3, 0.1, 0.3, 0.6, 0.9, 0.2, 0.4, 0.7, 0.8, 0.5, 0.3])
iex> labels = Nx.tensor([0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Metrics.Calibration.compute(probs, labels, sensitive, n_bins: 5)
iex> result.n_bins
5

reliability_diagram(probabilities, labels, sensitive_attr, opts \\ [])

@spec reliability_diagram(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) ::
  %{
    bins: [map()],
    n_bins: integer(),
    strategy: :uniform | :quantile
  }

Generates reliability diagram data for calibration plotting.

Returns bin-level accuracy, confidence, and counts per group using the same binning strategy as compute/4.