Contributing to ExFairness

Thank you for your interest in contributing to ExFairness! This document provides guidelines for contributing to the project.

Code of Conduct
Getting Started
Development Workflow
Contribution Guidelines
Testing Requirements
Documentation Standards
Submitting Changes

Code of Conduct

Our Pledge

We are committed to providing a welcoming and inclusive environment for all contributors, regardless of background or identity.

Expected Behavior

Be respectful and considerate in all interactions
Provide constructive feedback
Focus on what's best for the project and community
Show empathy towards other contributors

Unacceptable Behavior

Harassment or discriminatory language
Personal attacks or trolling
Publishing others' private information
Other conduct inappropriate in a professional setting

Getting Started

Prerequisites

Elixir 1.14 or higher
Erlang/OTP 25 or higher
Git
Basic understanding of fairness in machine learning (optional but helpful)

Setting Up Development Environment

# 1. Fork the repository on GitHub
# 2. Clone your fork
git clone https://github.com/YOUR_USERNAME/ExFairness.git
cd ExFairness

# 3. Add upstream remote
git remote add upstream https://github.com/North-Shore-AI/ExFairness.git

# 4. Install dependencies
mix deps.get

# 5. Verify tests pass
mix test

# 6. Verify quality checks pass
mix format --check-formatted
mix compile --warnings-as-errors
mix credo --strict

Development Workflow

Strict Test-Driven Development (TDD)

ExFairness follows strict TDD. All contributions must follow the Red-Green-Refactor cycle:

1. RED Phase - Write Failing Tests

# test/ex_fairness/metrics/new_metric_test.exs
defmodule ExFairness.Metrics.NewMetricTest do
  use ExUnit.Case, async: true
  doctest ExFairness.Metrics.NewMetric

  describe "compute/3" do
    test "computes metric correctly" do
      predictions = Nx.tensor([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
      sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

      result = NewMetric.compute(predictions, sensitive)

      assert result.metric_value == expected_value
      assert result.passes == expected_pass_fail
    end

    # Add more tests...
  end
end

Run tests to verify they fail:

mix test test/ex_fairness/metrics/new_metric_test.exs
# Should show compilation error or test failures

2. GREEN Phase - Implement to Pass

# lib/ex_fairness/metrics/new_metric.ex
defmodule ExFairness.Metrics.NewMetric do
  @moduledoc """
  Documentation for new metric.

  ## Mathematical Definition

  [Include formal definition]

  ## When to Use

  [Explain appropriate use cases]

  ## Limitations

  [Discuss limitations]

  ## References

  [Include research citations]
  """

  alias ExFairness.Validation

  @spec compute(Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: map()
  def compute(predictions, sensitive_attr, opts \\ []) do
    # Validate inputs
    Validation.validate_predictions!(predictions)
    # ... implement logic
  end
end

Run tests to verify they pass:

mix test test/ex_fairness/metrics/new_metric_test.exs
# Should show all tests passing

3. REFACTOR Phase - Optimize and Document

Add comprehensive documentation
Add type specifications
Optimize performance
Add doctests
Ensure code formatting

mix format
mix compile --warnings-as-errors
mix credo --strict

Contribution Guidelines

Types of Contributions

We welcome:

Bug Fixes - Fix issues in existing code
New Metrics - Implement additional fairness metrics
New Detection Algorithms - Add bias detection methods
New Mitigation Techniques - Add fairness mitigation approaches
Documentation Improvements - Enhance docs, examples, guides
Performance Optimizations - Improve speed/memory usage
Test Additions - Add edge cases, property tests, integration tests

Before Starting

Check existing issues - Avoid duplicate work
Open an issue - Discuss your proposal first
Get approval - Especially for large changes
Follow the roadmap - See docs/20251020/future_directions.md

Coding Standards

Code Style

Follow the Elixir Style Guide
Use mix format (configured for 100-char lines)
Pass mix credo --strict
No compiler warnings

Naming Conventions

# Modules: CamelCase
defmodule ExFairness.Metrics.DemographicParity

# Functions: snake_case
def compute_disparity(predictions, sensitive_attr)

# Variables: snake_case
group_a_rate = 0.5

# Constants: @uppercase
@default_threshold 0.1

# Private functions: prefix with defp
defp generate_interpretation(...)

Type Specifications

Required for all public functions:

@type result :: %{
  disparity: float(),
  passes: boolean(),
  threshold: float()
}

@spec compute(Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: result()
def compute(predictions, sensitive_attr, opts \\ []) do
  # ...
end

Testing Requirements

Minimum Test Coverage

Every new feature must include:

At least 5 unit tests:
- Happy path (normal case)
- Edge case #1
- Edge case #2
- Error case (validation)
- Configuration test (custom options)
At least 1 doctest:
- Working example in @doc
- Verified to execute correctly
Property tests (if applicable):
- For metrics: symmetry, boundedness, monotonicity

Test Data Requirements

Minimum 10 samples per group (statistical reliability)
Use 20-element patterns for consistency
Explicit calculations in comments
Realistic scenarios (not trivial 1-2 samples)

Example:

test "computes metric correctly" do
  # Group A: 5/10 = 0.5, Group B: 3/10 = 0.3
  # Expected disparity: 0.2
  predictions = Nx.tensor([1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0])
  sensitive = Nx.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

  result = YourMetric.compute(predictions, sensitive)

  assert_in_delta(result.disparity, 0.2, 0.01)
end

Running Tests

# Run all tests
mix test

# Run specific test file
mix test test/ex_fairness/metrics/your_metric_test.exs

# Run with coverage
mix coveralls

# Run specific test
mix test test/ex_fairness/metrics/your_metric_test.exs:42

Documentation Standards

Module Documentation (@moduledoc)

Every module must include:

defmodule ExFairness.Metrics.YourMetric do
  @moduledoc """
  Brief description of the metric.

  ## Mathematical Definition

  [Include formal probability notation]

  ## When to Use

  - Use case 1
  - Use case 2

  ## Limitations

  - Limitation 1
  - Limitation 2

  ## References

  - Author (Year). "Paper title." *Venue*.

  ## Examples

      iex> # Working example
      iex> result = ExFairness.Metrics.YourMetric.compute(...)
      iex> result.passes
      true

  """
end

Function Documentation (@doc)

Every public function must include:

@doc """
Brief description.

## Parameters

  * `param1` - Description
  * `param2` - Description
  * `opts` - Options:
    * `:option1` - Description (default: value)

## Returns

A map containing:
  * `:field1` - Description
  * `:field2` - Description

## Examples

    iex> result = function(arg1, arg2)
    iex> result.field1
    expected_value

"""
@spec function(type1(), type2(), keyword()) :: return_type()
def function(param1, param2, opts \\ []) do
  # Implementation
end

Citation Format

Follow academic citation standards:

Author, A., Author, B., & Author, C. (Year). "Title of paper."
*Journal/Conference Name*, volume(issue), pages.
DOI: xx.xxxx/xxxxx

Example:

Hardt, M., Price, E., & Srebro, N. (2016). "Equality of Opportunity
in Supervised Learning." In *Advances in Neural Information Processing
Systems* (NeurIPS '16), pp. 3315-3323.

Submitting Changes

Pull Request Process

Create a feature branch

git checkout -b feature/your-feature-name

Make your changes
- Follow TDD (tests first)
- Follow coding standards
- Update documentation

Verify quality

mix format
mix test
mix compile --warnings-as-errors
mix credo --strict
mix dialyzer  # If PLT already built

Commit with clear messages

git commit -m "Add calibration fairness metric

Implements calibration metric as specified in Pleiss et al. (2017).
Includes binning, ECE computation, and calibration curves.

- 15 unit tests
- 2 doctests
- Complete documentation with mathematical definition
- Citations included
"

Push to your fork

git push origin feature/your-feature-name

Open Pull Request
- Use clear PR title
- Reference any related issues
- Describe what you changed and why
- Include test results

Pull Request Template

## Description
[Describe your changes]

## Motivation
[Why is this change needed?]

## Related Issues
Fixes #123

## Changes
- [ ] New feature / bug fix / documentation
- [ ] Tests added/updated
- [ ] Documentation added/updated
- [ ] CHANGELOG.md updated

## Testing
- [ ] All tests pass (`mix test`)
- [ ] No warnings (`mix compile --warnings-as-errors`)
- [ ] Credo passes (`mix credo --strict`)
- [ ] Code formatted (`mix format --check-formatted`)

## Checklist
- [ ] Followed TDD (tests written first)
- [ ] Added type specs (@spec)
- [ ] Added documentation (@doc)
- [ ] Included research citations (if applicable)
- [ ] Updated CHANGELOG.md

Commit Message Guidelines

Format:

<type>: <subject>

<body>

<footer>

Types:

feat: New feature
fix: Bug fix
docs: Documentation only
test: Test additions/changes
refactor: Code refactoring
perf: Performance improvements
chore: Maintenance tasks

Example:

feat: Add calibration fairness metric

Implements calibration metric with binning and ECE computation.
Based on Pleiss et al. (2017) "On fairness and calibration."

- 15 unit tests for binning strategies and edge cases
- 2 doctests with working examples
- Complete mathematical documentation
- Citations: Pleiss et al. (2017)

Closes #42

Adding New Fairness Metrics

Step-by-Step Guide

1. Research Phase

[ ] Find peer-reviewed paper defining the metric
[ ] Understand mathematical definition
[ ] Identify when to use and limitations
[ ] Check if similar metric exists

2. Design Phase

[ ] Write specification document (in docs/)
[ ] Define function signature and return type
[ ] Plan test cases (minimum 10)
[ ] Get approval via GitHub issue

3. Implementation Phase (TDD)

RED - Write tests first:

# Create test file
touch test/ex_fairness/metrics/your_metric_test.exs

# Write comprehensive tests
# Run and verify they fail
mix test test/ex_fairness/metrics/your_metric_test.exs

GREEN - Implement:

# Create implementation file
touch lib/ex_fairness/metrics/your_metric.ex

# Implement minimum code to pass tests
# Run and verify tests pass
mix test test/ex_fairness/metrics/your_metric_test.exs

REFACTOR - Polish:

# Add documentation
# Add type specs
# Optimize if needed
# Add to main API (lib/ex_fairness.ex)

# Verify everything passes
mix test
mix format
mix compile --warnings-as-errors
mix credo --strict

4. Documentation Phase

[ ] Add to README.md examples section
[ ] Add to mathematical foundations section
[ ] Include in metrics reference table
[ ] Add research citations with DOI
[ ] Update CHANGELOG.md

5. Validation Phase

[ ] Test against reference implementation (if available)
[ ] Verify on real dataset (if applicable)
[ ] Performance benchmark
[ ] Code review

Metric Template

Use this template for new metrics:

defmodule ExFairness.Metrics.YourMetric do
  @moduledoc """
  Brief description.

  ## Mathematical Definition

  [Formal definition with notation]

  ## When to Use

  - Use case 1
  - Use case 2

  ## Limitations

  - Limitation 1
  - Limitation 2

  ## References

  - Citation 1
  - Citation 2

  ## Examples

      iex> # Working example
  """

  alias ExFairness.{Utils, Validation}

  @default_threshold 0.1
  @default_min_per_group 10

  @type result :: %{
    # Define return type fields
  }

  @spec compute(Nx.Tensor.t(), Nx.Tensor.t(), keyword()) :: result()
  def compute(predictions, sensitive_attr, opts \\ []) do
    # 1. Extract options
    # 2. Validate inputs
    # 3. Compute metric
    # 4. Generate interpretation
    # 5. Return result map
  end

  defp generate_interpretation(...) do
    # Plain language explanation
  end
end

Testing Requirements

Test File Structure

defmodule ExFairness.Metrics.YourMetricTest do
  use ExUnit.Case, async: true
  doctest ExFairness.Metrics.YourMetric

  alias ExFairness.Metrics.YourMetric

  describe "compute/3" do
    test "computes perfect fairness" do
      # Test with zero disparity
    end

    test "detects disparity" do
      # Test with known disparity
    end

    test "accepts custom threshold" do
      # Test configuration options
    end

    test "validates inputs" do
      # Test input validation
    end

    test "handles edge case: all zeros" do
      # Edge case testing
    end

    test "handles edge case: all ones" do
      # Edge case testing
    end

    test "returns interpretation" do
      # Test interpretation generation
    end
  end
end

Mandatory Test Coverage

[ ] Happy path (normal operation)
[ ] Perfect fairness (disparity = 0)
[ ] Maximum disparity
[ ] Custom threshold
[ ] Input validation (invalid inputs raise errors)
[ ] Edge case: all zeros
[ ] Edge case: all ones
[ ] Edge case: single value
[ ] Unbalanced groups
[ ] Interpretation generation

Assertion Guidelines

For floating point values:

# Use assert_in_delta with 0.01 tolerance
assert_in_delta(result.disparity, 0.5, 0.01)

For exact values:

# Use exact equality
assert result.passes == true
assert Nx.to_number(count) == 10

For errors:

# Use assert_raise with regex
assert_raise ExFairness.Error, ~r/must be binary/, fn ->
  YourMetric.compute(invalid_input, sensitive)
end

Documentation Standards

Required Documentation Elements

Every new module must include:

@moduledoc with:
- Brief description
- Mathematical definition (formal notation)
- When to use (3+ bullet points)
- Limitations (2+ bullet points)
- Research citations (full bibliographic info)
- Working example (doctest)
@doc for every public function with:
- Description
- Parameters section (with types and defaults)
- Returns section (with structure)
- Examples section (with doctest)
@spec for every public function
Inline comments for complex logic

Documentation Verification

# Generate docs locally
mix docs

# Open in browser
open doc/index.html

# Check for warnings
mix docs 2>&1 | grep warning

# Verify doctests pass
mix test --only doctest

Code Review Checklist

Before submitting PR, verify:

Code Quality

[ ] No compiler warnings (mix compile --warnings-as-errors)
[ ] No Credo issues (mix credo --strict)
[ ] Code formatted (mix format --check-formatted)
[ ] No Dialyzer errors (mix dialyzer)

Testing

[ ] All new code has tests
[ ] All tests pass (mix test)
[ ] Test coverage is comprehensive
[ ] Edge cases covered
[ ] Doctests work

Documentation

[ ] @moduledoc added to new modules
[ ] @doc added to new public functions
[ ] @spec added to all public functions
[ ] Examples work (verified by doctests)
[ ] Research citations included
[ ] README.md updated (if user-facing change)
[ ] CHANGELOG.md updated

Quality

[ ] Follows existing code patterns
[ ] No code duplication
[ ] Appropriate use of Nx.Defn (GPU acceleration)
[ ] Error messages are helpful
[ ] Comments explain "why" not "what"

Development Commands

Essential Commands

# Install dependencies
mix deps.get

# Run tests
mix test

# Run specific test
mix test test/path/to/test.exs:line_number

# Run with coverage
mix coveralls
mix coveralls.html  # HTML report in cover/

# Format code
mix format

# Check formatting
mix format --check-formatted

# Compile with warnings as errors
mix compile --warnings-as-errors

# Run linter
mix credo --strict

# Type checking (requires PLT build)
mix dialyzer

# Generate documentation
mix docs

# Full quality check (run before PR)
mix format --check-formatted && \
mix compile --warnings-as-errors && \
mix test && \
mix credo --strict

Building PLT for Dialyzer (One-time)

# This takes a few minutes the first time
mix dialyzer --plt

# Then run analysis
mix dialyzer

Performance Considerations

When to Use Nx.Defn

Use for:

Numerical computations
Operations on tensors
Code that benefits from GPU acceleration

Don't use for:

String manipulation
Control flow with dynamic decisions
I/O operations

Example

# Good: Numerical computation with defn
import Nx.Defn

defn compute_disparity(rate_a, rate_b) do
  Nx.abs(Nx.subtract(rate_a, rate_b))
end

# Good: Validation in regular Elixir
def compute(predictions, sensitive_attr, opts \\ []) do
  Validation.validate_predictions!(predictions)  # Regular Elixir
  disparity = compute_disparity(rate_a, rate_b)  # Nx.Defn
end

Adding Research Citations

Citation Requirements

For new metrics or algorithms:

Find the original paper that proposed the technique
Include full citation with:
- Authors (all, or first 3 + "et al.")
- Year
- Title (in quotes)
- Venue (journal or conference)
- Volume/issue/pages (for journals)
- DOI (if available)
Add to module @moduledoc
Add to README.md Research Foundations section

Citation Format Example

@moduledoc """
Your metric description.

## References

- Hardt, M., Price, E., & Srebro, N. (2016). "Equality of Opportunity
  in Supervised Learning." In *Advances in Neural Information Processing
  Systems* (NeurIPS '16), pp. 3315-3323.
"""

Common Pitfalls to Avoid

Don't

❌ Write implementation before tests ❌ Change tests to make them pass (fix code instead) ❌ Skip edge case testing ❌ Use floating point equality (use assert_in_delta) ❌ Forget to update CHANGELOG.md ❌ Add compiler warnings ❌ Skip documentation ❌ Use trivial test data (2-3 samples) ❌ Forget type specifications ❌ Copy-paste without attribution

Do

✅ Write tests first (TDD) ✅ Use assert_in_delta for floats ✅ Test edge cases explicitly ✅ Update CHANGELOG.md ✅ Add comprehensive documentation ✅ Include research citations ✅ Use realistic test data (10+ per group) ✅ Add type specifications ✅ Format code before committing ✅ Run full quality check before PR

Getting Help

Resources

Documentation: https://hexdocs.pm/ex_fairness
Issues: https://github.com/North-Shore-AI/ExFairness/issues
Discussions: https://github.com/North-Shore-AI/ExFairness/discussions
Technical Docs: docs/20251020/ directory

Asking Questions

Good question:

"I want to add the calibration metric from Pleiss et al. (2017). I've read the paper and understand the math. Should I use uniform binning or quantile binning for the default? The paper uses uniform but some implementations use quantile."

Contains:

Specific feature
Research reference
Shows you've done homework
Asks specific question

Not helpful:

"How do I add a new metric?"

Too vague:

No specific metric mentioned
No research reference
No specific question

Response Time

Simple questions: 24-48 hours
Feature proposals: 3-7 days for review
Pull requests: 1-2 weeks for review

Release Process (Maintainers Only)

Version Numbering

Follows Semantic Versioning:

MAJOR (1.0.0): Breaking changes
MINOR (0.2.0): New features, backward compatible
PATCH (0.1.1): Bug fixes only

Release Checklist

[ ] All tests pass
[ ] CHANGELOG.md updated
[ ] Version bumped in mix.exs
[ ] Documentation generated successfully
[ ] Git tag created (git tag -a v0.2.0 -m "Release v0.2.0")
[ ] Pushed to GitHub (git push --tags)
[ ] Published to Hex.pm (mix hex.publish)
[ ] HexDocs generated
[ ] GitHub release created with notes

Recognition

Contributors will be:

Listed in release notes
Mentioned in CHANGELOG.md
Credited in git commit history
Thanked in project documentation

Significant contributions may lead to:

Co-authorship on academic papers
Maintainer status
Conference presentation opportunities

Questions?

If you have questions about contributing, please:

Check this document first
Search existing issues
Open a new issue with the question label
Be patient - we're a small team!

Thank You!

Your contributions help make ML fairer for everyone. We appreciate your effort to improve ExFairness!

Happy Contributing! 🚀

Last Updated: October 20, 2025 Version: 1.0 Maintainers: North Shore AI Research Team

← Previous Page README

Next Page → ExFairness Architecture

Contributing to ExFairness

Table of Contents

Code of Conduct

Our Pledge

Expected Behavior

Unacceptable Behavior

Getting Started

Prerequisites

Setting Up Development Environment

Development Workflow

Strict Test-Driven Development (TDD)

1. RED Phase - Write Failing Tests

2. GREEN Phase - Implement to Pass

3. REFACTOR Phase - Optimize and Document

Contribution Guidelines

Types of Contributions

Before Starting

Coding Standards

Code Style

Naming Conventions

Type Specifications

Testing Requirements

Minimum Test Coverage

Test Data Requirements

Running Tests

Documentation Standards

Module Documentation (@moduledoc)

Function Documentation (@doc)

Citation Format

Submitting Changes

Pull Request Process

Pull Request Template

Commit Message Guidelines

Adding New Fairness Metrics

Step-by-Step Guide

1. Research Phase

2. Design Phase

3. Implementation Phase (TDD)

4. Documentation Phase

5. Validation Phase

Metric Template

Testing Requirements

Test File Structure

Mandatory Test Coverage

Assertion Guidelines

Documentation Standards

Required Documentation Elements

Documentation Verification

Code Review Checklist

Code Quality

Testing

Documentation

Quality

Development Commands

Essential Commands

Building PLT for Dialyzer (One-time)

Performance Considerations

When to Use Nx.Defn

Example

Adding Research Citations

Citation Requirements

Citation Format Example

Common Pitfalls to Avoid

Don't

Do

Getting Help

Resources

Asking Questions

Response Time

Release Process (Maintainers Only)

Version Numbering

Release Checklist

Recognition

Questions?

Thank You!