ExFairness.Utils.StatisticalTests (ExFairness v0.5.1)

View Source

Hypothesis testing for fairness metrics.

Provides parametric and non-parametric tests to assess statistical significance of observed disparities in fairness metrics.

Statistical Tests

  • Two-Proportion Z-Test: Tests demographic parity differences
  • Chi-Square Test: Tests independence in confusion matrices
  • Permutation Test: Non-parametric test for any metric

References

  • Agresti, A. (2018). "Statistical methods for the social sciences."
  • Good, P. (2013). "Permutation tests: A practical guide to resampling methods for testing hypotheses."
  • Cohen, J. (1988). "Statistical power analysis for the behavioral sciences."

Examples

iex> predictions = Nx.tensor([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.two_proportion_test(predictions, sensitive)
iex> is_float(result.p_value) and result.p_value >= 0.0 and result.p_value <= 1.0
true

Summary

Functions

Computes Cohen's h effect size for two proportions.

Permutation test for any fairness metric.

Two-proportion Z-test for demographic parity.

Types

test_result()

@type test_result() :: %{
  statistic: float(),
  p_value: float(),
  significant: boolean(),
  alpha: float(),
  effect_size: float() | nil,
  test_name: String.t(),
  interpretation: String.t()
}

Functions

chi_square_test(predictions, labels, sensitive_attr, opts \\ [])

@spec chi_square_test(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) ::
  test_result()

Chi-square test for equalized odds.

Tests whether confusion matrices are independent of group membership.

Hypotheses

  • H₀: Confusion matrix is independent of sensitive attribute
  • H₁: Confusion matrix depends on sensitive attribute

Test Statistic

χ² = Σ (O_ij - E_ij)² / E_ij

where O_ij = observed count, E_ij = expected count under independence

Parameters

  • predictions - Binary predictions tensor
  • labels - Binary labels tensor
  • sensitive_attr - Binary sensitive attribute tensor
  • opts:
    • :alpha - Significance level (default: 0.05)

Returns

Test result map with chi-square statistic and p-value.

Examples

iex> predictions = Nx.tensor([1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1])
iex> labels = Nx.tensor([1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.chi_square_test(predictions, labels, sensitive)
iex> result.test_name
"Chi-Square Test"

cohens_h(p1, p2)

@spec cohens_h(float(), float()) :: float()

Computes Cohen's h effect size for two proportions.

Cohen's h is the difference between two arcsine-transformed proportions.

Effect Size Guidelines

  • Small: h ≈ 0.2
  • Medium: h ≈ 0.5
  • Large: h ≈ 0.8

Formula

h = 2 * (arcsin(p) - arcsin(p))

Examples

iex> h = ExFairness.Utils.StatisticalTests.cohens_h(0.5, 0.3)
iex> h > 0.4 and h < 0.5
true

permutation_test(data, metric_fn, opts \\ [])

@spec permutation_test([Nx.Tensor.t()], function(), keyword()) :: test_result()

Permutation test for any fairness metric.

Non-parametric test that doesn't assume normal distribution.

Algorithm

  1. Compute observed metric on actual data
  2. For i = 1 to n_permutations: a. Randomly permute sensitive attributes b. Compute metric on permuted data c. Store permuted_statistics[i]
  3. P-value = proportion of permuted statistics ≥ observed

Parameters

  • data - List of data tensors [predictions, labels?, sensitive_attr]
  • metric_fn - Function computing metric (returns numeric value)
  • opts:
    • :n_permutations - Number of permutations (default: 10000)
    • :alpha - Significance level (default: 0.05)
    • :alternative - Test direction (:two_sided, :greater, :less)
    • :seed - Random seed for reproducibility

Returns

Test result map with permutation statistics and p-value.

Examples

iex> predictions = Nx.tensor([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> metric_fn = fn [preds, sens] ->
...>   result = ExFairness.demographic_parity(preds, sens)
...>   result.disparity
...> end
iex> result = ExFairness.Utils.StatisticalTests.permutation_test(
...>   [predictions, sensitive],
...>   metric_fn,
...>   n_permutations: 100,
...>   seed: 42
...> )
iex> result.test_name
"Permutation Test"

two_proportion_test(predictions, sensitive_attr, opts \\ [])

@spec two_proportion_test(Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: test_result()

Two-proportion Z-test for demographic parity.

Tests whether positive prediction rates differ significantly between groups.

Hypotheses

  • H₀: p_A = p_B (no disparity between groups)
  • H₁: p_A ≠ p_B (disparity exists)

Test Statistic

Under H₀, the standard error is:

SE = sqrt( * (1 - ) * (1/n_A + 1/n_B))

where p̂ = (n_A p_A + n_B p_B) / (n_A + n_B)

Z-statistic:

Z = (p_A - p_B) / SE

P-value (two-tailed):

p = 2 * P(|Z| > |z_observed|)

Assumptions

  • Large sample sizes (n_A, n_B > 30 recommended)
  • Independent observations
  • np and n(1-p) > 5 for both groups

Parameters

  • predictions - Binary predictions tensor (0 or 1)
  • sensitive_attr - Binary sensitive attribute tensor (0 or 1)
  • opts:
    • :alpha - Significance level (default: 0.05)
    • :alternative - Test direction (:two_sided, :greater, :less)

Returns

Test result map with statistic, p-value, and significance.

Examples

iex> predictions = Nx.tensor([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.two_proportion_test(predictions, sensitive)
iex> result.test_name
"Two-Proportion Z-Test"