Changelog
View SourceAll notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
0.5.1 - 2025-12-28
Changed
- Updated
crucible_frameworkdependency from~> 0.5.0to~> 0.5.2
0.5.0 - 2025-12-27
Changed
- Normalized describe/1 to canonical schema format - The
describe/1callback now returns a schema conforming to the Crucible Stage contract specification v1.0.- Changed
:stagekey to:namekey (atom value) - Added
__schema_version__: "1.0.0"marker for schema evolution - Added
requiredfield (list of required option keys) - Added
optionalfield (list of optional option keys) - Added
typesfield (type specifications for all options) - Added
defaultsfield (default values for optional options) - Moved
metricslist to__extensions__.fairness.supported_metrics - Added
data_sourcesandoutput_locationto extensions
- Changed
Added
- Conformance tests - New
test/ex_fairness/conformance_test.exsvalidates Stage contract compliance - Extended describe/1 tests - Comprehensive tests for canonical schema format
Dependencies
- Updated
crucible_frameworkdependency to~> 0.5.0(required for new describe/1 contract)
0.4.0 - 2025-12-25
Added
- ExFairness.CrucibleStage implementing
Crucible.Stagefor crucible_framework pipelines. - Environment-specific config files (
config/*.exs) to disable the CrucibleFramework repo by default. - Documentation snapshot and gap analysis in
docs/20251225/. - Crucible stage integration test suite.
Changed
- Refactored
ExFairness.evaluate/5into smaller helpers for metric computation and violations. - Improved chi-square computation structure in
ExFairness.Utils.StatisticalTests. - Updated project logo in
assets/ExFairness.svg. - Dependencies: add
crucible_framework, updatecrucible_irto~> 0.2.0, addecto_sqlandpostgrex.
0.3.1 - 2025-11-26
Added - CrucibleIR Integration
Pipeline Stage:
- ExFairness.Stage - Pipeline stage for Crucible framework integration
- Seamless integration with CrucibleIR experiment orchestration
- Accepts
CrucibleIR.Reliability.Fairnessconfiguration - Extracts predictions, labels, and sensitive attributes from model outputs
- Supports all ExFairness metrics (demographic parity, equalized odds, equal opportunity, predictive parity, calibration)
- Configurable threshold and fail-on-violation behavior
- Comprehensive error handling and validation
- Returns structured fairness results with violations tracking
Main API Enhancement:
- ExFairness.evaluate/5 - New function for CrucibleIR config-based evaluation
- Direct evaluation using
CrucibleIR.Reliability.Fairnessstruct - Optional probabilities parameter for calibration metrics
- Returns structured results with metrics, violations, and overall pass/fail status
- Conditionally compiled (only when crucible_ir is available)
- Direct evaluation using
Configuration Support
The integration supports the following CrucibleIR.Reliability.Fairness structure:
%CrucibleIR.Reliability.Fairness{
enabled: true, # Enable/disable fairness evaluation
metrics: [:demographic_parity, :equalized_odds, :equal_opportunity, :predictive_parity, :calibration],
group_by: :gender, # Sensitive attribute field name
threshold: 0.1, # Maximum acceptable disparity
fail_on_violation: false, # Whether to fail on violations
options: %{} # Additional metric-specific options
}Testing
New Test Suite:
- ExFairness.StageTest - 15 comprehensive tests
- Stage description validation
- Disabled fairness pass-through
- Single and multiple metric evaluation
- Calibration with/without probabilities
- Custom threshold configuration
- Violation detection and reporting
- Fail-on-violation behavior
- Invalid context handling
- Unknown metric handling
- Custom options pass-through
Test Coverage: 174 (v0.3.0) → 189 (v0.4.0) = +15 tests (+8.6%)
Dependencies
New Dependencies:
{:crucible_ir, "~> 0.1.1"}- CrucibleIR configuration structs
Documentation
Updated Documentation:
- mix.exs - Added ExFairness.Stage to Pipeline module group
- README.md - Added Stage usage examples and CrucibleIR integration guide
- ExFairness.Stage - Comprehensive module documentation with examples
- ExFairness.evaluate/5 - Full API documentation with examples
Quality Metrics
- Zero compilation warnings (enforced via warnings_as_errors)
- Zero Dialyzer errors (type-safe)
- All tests passing (189 total tests)
- Backward compatible: All v0.3.0 code works without modification
Integration Benefits
- Seamless Crucible Integration: ExFairness can now be used as a pipeline stage in Crucible experiments
- Standardized Configuration: Uses CrucibleIR configuration structs for consistency
- Experiment Orchestration: Fairness evaluation can be automated as part of experiment pipelines
- Flexible Violation Handling: Choose whether fairness violations should fail experiments
- Comprehensive Results: Structured output suitable for experiment reporting
Example Usage
# Configure fairness evaluation
config = %CrucibleIR.Reliability.Fairness{
enabled: true,
metrics: [:demographic_parity, :equalized_odds],
group_by: :gender,
threshold: 0.1,
fail_on_violation: false
}
# In a Crucible pipeline
context = %{
experiment: %{reliability: %{fairness: config}},
outputs: model_outputs # List of maps with :prediction, :label, :gender
}
{:ok, result_context} = ExFairness.Stage.run(context)
# result_context.fairness contains fairness evaluation results
# Or use the direct evaluation API
result = ExFairness.evaluate(predictions, labels, sensitive_attr, config)
# Returns %{metrics: ..., overall_passes: ..., violations: ...}Breaking Changes
None - This is a backward compatible release. All existing code continues to work unchanged.
Migration from v0.3.0
No code changes required. The new CrucibleIR integration is opt-in and does not affect existing usage patterns.
0.3.0 - 2025-11-25
Added - Statistical Inference and Calibration
Statistical Inference Framework:
- ExFairness.Utils.Bootstrap - Bootstrap confidence interval computation
- Stratified bootstrap to preserve group proportions
- Parallel and sequential computation modes
- Percentile and basic bootstrap methods
- Configurable number of samples (default: 1000)
- Reproducible with seed parameter
- GPU-accelerated metric computation via Nx.Defn
- ExFairness.Utils.StatisticalTests - Hypothesis testing for fairness metrics
- Two-proportion Z-test for demographic parity
- Chi-square test for equalized odds
- Permutation test for any fairness metric (non-parametric)
- Cohen's h effect size computation
- Configurable significance levels (default: α=0.05)
- Statistical interpretation generation
Calibration Fairness Metric:
- ExFairness.Metrics.Calibration - Calibration fairness for probability predictions
- Expected Calibration Error (ECE) computation
- Maximum Calibration Error (MCE) computation
- Uniform and quantile binning strategies
- Configurable number of bins (default: 10)
- Group-wise calibration comparison
- Validation for probability ranges [0, 1]
Enhanced - Existing Metrics
All existing fairness metrics can now optionally include:
- Bootstrap confidence intervals
- Statistical hypothesis tests
- Effect size measures
- Enhanced interpretations with statistical significance
Example usage:
result = ExFairness.demographic_parity(predictions, sensitive_attr,
include_ci: true, # NEW: Bootstrap CI
statistical_test: :z_test, # NEW: Hypothesis testing
bootstrap_samples: 1000, # NEW: Configurable bootstrap
confidence_level: 0.95 # NEW: CI level
)
# Returns enhanced result with :confidence_interval and :p_valueTesting
New Test Suites:
- ExFairness.Utils.BootstrapTest - 11 comprehensive tests
- Bootstrap interval validation
- Stratified sampling verification
- Method comparison (percentile vs basic)
- Reproducibility testing
- Parallel vs sequential equivalence
- ExFairness.Utils.StatisticalTestsTest - 14 comprehensive tests
- Two-proportion Z-test validation
- Chi-square test verification
- Permutation test correctness
- Effect size computation
- P-value range validation
- ExFairness.Metrics.CalibrationTest - 15 comprehensive tests
- ECE/MCE computation validation
- Binning strategy verification
- Probability range validation
- Edge case handling
Total Tests: 134 (v0.2.0) → 174 (v0.3.0) = +40 tests (+30%)
Documentation
Design Documentation:
- docs/20251125/enhancements_design.md (comprehensive 8-week implementation plan)
- Statistical inference algorithms and formulas
- Calibration metric mathematical foundation
- Implementation roadmap and success criteria
- API examples and migration guide
- Research citations (10+ additional papers)
Updated Documentation:
- mix.exs - Version updated to 0.3.0, new modules added to docs
- README.md - Version badge and installation instructions updated
- CHANGELOG.md - Complete v0.3.0 release notes
Quality Metrics
- Zero compilation warnings (enforced via warnings_as_errors)
- Zero Dialyzer errors (type-safe)
- Test coverage target: >90% (expected)
- Backward compatible: All v0.2.0 code works without modification
Performance
- Bootstrap: ~1-2 seconds for 1000 samples on standard metrics
- Permutation test: ~2-3 seconds for 10,000 permutations
- Parallel bootstrap: 4-8x speedup on multi-core systems
- Calibration: <100ms for typical datasets
Research Foundations
New Academic Citations:
- Efron, B., & Tibshirani, R. J. (1994). "An introduction to the bootstrap." CRC press.
- Davison, A. C., & Hinkley, D. V. (1997). "Bootstrap methods and their application."
- Good, P. (2013). "Permutation tests: A practical guide to resampling methods."
- Agresti, A. (2018). "Statistical methods for the social sciences."
- Cohen, J. (1988). "Statistical power analysis for the behavioral sciences."
- Pleiss, G., et al. (2017). "On fairness and calibration." NeurIPS.
- Guo, C., et al. (2017). "On calibration of modern neural networks." ICML.
Breaking Changes
None - This is a backward compatible release. All existing code continues to work unchanged.
Migration from v0.2.0
No code changes required. All new features are opt-in via additional parameters.
See docs/20251125/enhancements_design.md for detailed migration examples.
0.2.0 - 2025-10-20
Added - Comprehensive Technical Documentation
- future_directions.md (1,941 lines) - Complete roadmap to v1.0.0
- Detailed specifications for statistical inference
- Calibration metric with complete algorithm
- Intersectional analysis implementation plan
- Threshold optimization algorithm
- 6-month development timeline
- 12+ additional research citations
- implementation_report.md (1,288 lines) - Technical implementation details
- Module-by-module analysis of all 14 modules
- Algorithm documentation with pseudocode
- Design decisions and rationale
- Performance characteristics
- Code statistics and metrics
- testing_and_qa_strategy.md (1,220 lines) - QA methodology
- TDD philosophy and evidence
- Complete test coverage matrix (134 tests)
- Edge case testing strategy
- Future testing enhancements (property testing, integration testing)
- Quality gates and CI/CD specifications
Enhanced - README.md
- Expanded from ~660 to 1,437 lines (+118%)
- Added Mathematical Foundations section (200+ lines)
- Complete mathematical definitions for all 4 metrics
- Formal probability notation
- Disparity measures
- Comprehensive citations with DOI numbers
- Added Theoretical Background section (300+ lines)
- Types of fairness (group, individual, causal)
- Measurement problem discussion
- Impossibility theorem with proof intuition
- Fairness-accuracy tradeoff analysis
- Added Advanced Usage section (200+ lines)
- Axon integration example (neural networks)
- Scholar integration example (classical ML)
- Batch fairness analysis
- Production monitoring with GenServer
- Expanded Research Foundations (150+ lines)
- 15+ peer-reviewed papers with full bibliographic details
- DOI numbers for all citations
- Framework comparisons (AIF360, Fairlearn, etc.)
- Added API Reference section
- Updated real-world use cases with legal compliance checks
Documentation
- Total documentation: ~9,120 lines
- Academic citations: 27+ peer-reviewed papers
- Working code examples: 20+
- Integration patterns documented
0.1.0 - 2025-10-20
Added - Core Implementation
Infrastructure:
ExFairness.Error- Custom exception handling with type safetyExFairness.Validation- Comprehensive input validation- Binary tensor validation
- Shape matching validation
- Multiple groups requirement (min 2 groups)
- Sufficient samples validation (default: 10 per group)
- Helpful error messages with actionable suggestions
ExFairness.Utils- GPU-accelerated tensor operationspositive_rate/2- Positive prediction rate with maskingcreate_group_mask/2- Binary mask generationgroup_count/2- Sample counting per groupgroup_positive_rates/2- Batch rate computation
ExFairness.Utils.Metrics- Classification metricsconfusion_matrix/3- TP, FP, TN, FN with maskingtrue_positive_rate/3- TPR/Recallfalse_positive_rate/3- FPRpositive_predictive_value/3- PPV/Precision
Fairness Metrics:
ExFairness.Metrics.DemographicParity- P(Ŷ=1|A=0) = P(Ŷ=1|A=1)- Configurable threshold (default: 0.1)
- Plain language interpretations
- Citations: Dwork et al. (2012), Feldman et al. (2015)
ExFairness.Metrics.EqualizedOdds- Equal TPR and FPR across groups- Both error rates checked
- Combined pass/fail determination
- Citations: Hardt et al. (2016)
ExFairness.Metrics.EqualOpportunity- Equal TPR across groups- Relaxed version of equalized odds
- Focus on false negative parity
- Citations: Hardt et al. (2016)
ExFairness.Metrics.PredictiveParity- Equal PPV across groups- Precision parity
- Consistent prediction meaning
- Citations: Chouldechova (2017)
Detection Algorithms:
ExFairness.Detection.DisparateImpact- EEOC 80% rule- Legal standard for adverse impact
- 4/5ths rule implementation
- Legal interpretation with EEOC context
- Citations: EEOC (1978), Biddle (2006)
Mitigation Techniques:
ExFairness.Mitigation.Reweighting- Sample weighting for fairness- Supports demographic parity and equalized odds targets
- Formula: w(a,y) = P(Y=y) / P(A=a,Y=y)
- Normalized weights (mean = 1.0)
- GPU-accelerated via Nx.Defn
- Citations: Kamiran & Calders (2012)
Reporting System:
ExFairness.Report- Multi-metric fairness assessment- Aggregate pass/fail counts
- Overall assessment generation
- Markdown export (human-readable)
- JSON export (machine-readable)
Main API:
ExFairness.demographic_parity/3- Convenience functionExFairness.equalized_odds/4- Convenience functionExFairness.equal_opportunity/4- Convenience functionExFairness.predictive_parity/4- Convenience functionExFairness.fairness_report/4- Comprehensive reporting
Testing
- 134 total tests (102 unit tests + 32 doctests)
- 100% pass rate
- Comprehensive edge case coverage
- Strict TDD approach (Red-Green-Refactor)
- All tests async (parallel execution)
Quality Gates
- Zero compiler warnings (enforced)
- Zero Dialyzer errors (type-safe)
- Credo strict mode configured
- Code formatting enforced (100 char lines)
- ExCoveralls configured for coverage reports
Documentation
- Comprehensive README.md with examples
- Complete module documentation (@moduledoc)
- Complete function documentation (@doc)
- Working examples (verified by doctests)
- Research citations in all metrics
- Mathematical definitions included
Dependencies
- Production:
nx ~> 0.7(only production dependency) - Development:
ex_doc,dialyxir,excoveralls,credo,stream_data,jason