ExFairness.Utils.StatisticalTests (ExFairness v0.5.1)
View SourceHypothesis testing for fairness metrics.
Provides parametric and non-parametric tests to assess statistical significance of observed disparities in fairness metrics.
Statistical Tests
- Two-Proportion Z-Test: Tests demographic parity differences
- Chi-Square Test: Tests independence in confusion matrices
- Permutation Test: Non-parametric test for any metric
References
- Agresti, A. (2018). "Statistical methods for the social sciences."
- Good, P. (2013). "Permutation tests: A practical guide to resampling methods for testing hypotheses."
- Cohen, J. (1988). "Statistical power analysis for the behavioral sciences."
Examples
iex> predictions = Nx.tensor([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.two_proportion_test(predictions, sensitive)
iex> is_float(result.p_value) and result.p_value >= 0.0 and result.p_value <= 1.0
true
Summary
Functions
Chi-square test for equalized odds.
Computes Cohen's h effect size for two proportions.
Permutation test for any fairness metric.
Two-proportion Z-test for demographic parity.
Types
Functions
@spec chi_square_test(Nx.Tensor.t(), Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: test_result()
Chi-square test for equalized odds.
Tests whether confusion matrices are independent of group membership.
Hypotheses
- H₀: Confusion matrix is independent of sensitive attribute
- H₁: Confusion matrix depends on sensitive attribute
Test Statistic
χ² = Σ (O_ij - E_ij)² / E_ijwhere O_ij = observed count, E_ij = expected count under independence
Parameters
predictions- Binary predictions tensorlabels- Binary labels tensorsensitive_attr- Binary sensitive attribute tensoropts::alpha- Significance level (default: 0.05)
Returns
Test result map with chi-square statistic and p-value.
Examples
iex> predictions = Nx.tensor([1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1])
iex> labels = Nx.tensor([1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.chi_square_test(predictions, labels, sensitive)
iex> result.test_name
"Chi-Square Test"
Computes Cohen's h effect size for two proportions.
Cohen's h is the difference between two arcsine-transformed proportions.
Effect Size Guidelines
- Small: h ≈ 0.2
- Medium: h ≈ 0.5
- Large: h ≈ 0.8
Formula
h = 2 * (arcsin(√p₁) - arcsin(√p₂))Examples
iex> h = ExFairness.Utils.StatisticalTests.cohens_h(0.5, 0.3)
iex> h > 0.4 and h < 0.5
true
@spec permutation_test([Nx.Tensor.t()], function(), keyword()) :: test_result()
Permutation test for any fairness metric.
Non-parametric test that doesn't assume normal distribution.
Algorithm
- Compute observed metric on actual data
- For i = 1 to n_permutations: a. Randomly permute sensitive attributes b. Compute metric on permuted data c. Store permuted_statistics[i]
- P-value = proportion of permuted statistics ≥ observed
Parameters
data- List of data tensors [predictions, labels?, sensitive_attr]metric_fn- Function computing metric (returns numeric value)opts::n_permutations- Number of permutations (default: 10000):alpha- Significance level (default: 0.05):alternative- Test direction (:two_sided, :greater, :less):seed- Random seed for reproducibility
Returns
Test result map with permutation statistics and p-value.
Examples
iex> predictions = Nx.tensor([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> metric_fn = fn [preds, sens] ->
...> result = ExFairness.demographic_parity(preds, sens)
...> result.disparity
...> end
iex> result = ExFairness.Utils.StatisticalTests.permutation_test(
...> [predictions, sensitive],
...> metric_fn,
...> n_permutations: 100,
...> seed: 42
...> )
iex> result.test_name
"Permutation Test"
@spec two_proportion_test(Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: test_result()
Two-proportion Z-test for demographic parity.
Tests whether positive prediction rates differ significantly between groups.
Hypotheses
- H₀: p_A = p_B (no disparity between groups)
- H₁: p_A ≠ p_B (disparity exists)
Test Statistic
Under H₀, the standard error is:
SE = sqrt(p̂ * (1 - p̂) * (1/n_A + 1/n_B))where p̂ = (n_A p_A + n_B p_B) / (n_A + n_B)
Z-statistic:
Z = (p_A - p_B) / SEP-value (two-tailed):
p = 2 * P(|Z| > |z_observed|)Assumptions
- Large sample sizes (n_A, n_B > 30 recommended)
- Independent observations
- np and n(1-p) > 5 for both groups
Parameters
predictions- Binary predictions tensor (0 or 1)sensitive_attr- Binary sensitive attribute tensor (0 or 1)opts::alpha- Significance level (default: 0.05):alternative- Test direction (:two_sided, :greater, :less)
Returns
Test result map with statistic, p-value, and significance.
Examples
iex> predictions = Nx.tensor([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
iex> sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
iex> result = ExFairness.Utils.StatisticalTests.two_proportion_test(predictions, sensitive)
iex> result.test_name
"Two-Proportion Z-Test"