CrucibleXAI.Validation.Axioms (CrucibleXAI v0.4.0)

View Source

Theoretical axiom verification for explanation methods.

Tests whether explanations satisfy key mathematical properties that define good attribution methods. Based on Shapley value axioms and attribution theory.

Supported Axioms

  1. Completeness (Efficiency): Sum of attributions equals prediction difference
  2. Symmetry: Identical features receive identical attributions
  3. Dummy (Null Player): Features with no impact receive zero attribution
  4. Linearity: For linear models, attributions match coefficients

Applicable Methods

  • SHAP: Should satisfy completeness, symmetry, and dummy
  • Integrated Gradients: Should satisfy completeness
  • LinearSHAP: Should satisfy all axioms for linear models
  • LIME: Approximate, may violate axioms

Usage

shap_values = CrucibleXai.explain_shap(instance, background, predict_fn)

result = Axioms.validate_all_axioms(
  shap_values,
  instance,
  predict_fn,
  method: :shap,
  baseline: background
)

IO.inspect(result.all_satisfied)
# => true (for well-implemented SHAP)

References

Based on:

  • Shapley (1953) "A Value for N-person Games"
  • Lundberg & Lee (2017) "A Unified Approach to Interpreting Model Predictions"
  • Sundararajan et al. (2017) "Axiomatic Attribution for Deep Networks"

Summary

Types

axioms_result()

@type axioms_result() :: %{
  completeness: completeness_result(),
  symmetry: symmetry_result(),
  dummy: dummy_result(),
  linearity: linearity_result(),
  all_satisfied: boolean(),
  overall_score: float(),
  summary: String.t()
}

completeness_result()

@type completeness_result() ::
  completeness_test_result() | %{skipped: true, reason: String.t()}

completeness_test_result()

@type completeness_test_result() :: %{
  satisfies_completeness: boolean(),
  attribution_sum: number(),
  expected_sum: number(),
  error: number(),
  relative_error: float(),
  interpretation: String.t()
}

dummy_result()

@type dummy_result() :: %{
  satisfies_dummy: boolean(),
  dummy_features: [integer()],
  violations: [integer()],
  interpretation: String.t()
}

linearity_result()

@type linearity_result() ::
  linearity_test_result() | %{skipped: true, reason: String.t()}

linearity_test_result()

@type linearity_test_result() :: %{
  satisfies_linearity: boolean(),
  errors_by_feature: map(),
  expected_shap: map(),
  max_error: number(),
  interpretation: String.t()
}

symmetry_result()

@type symmetry_result() ::
  symmetry_test_result() | %{skipped: true, reason: String.t()}

symmetry_test_result()

@type symmetry_test_result() :: %{
  satisfies_symmetry: boolean(),
  violations: list(),
  max_violation: number(),
  interpretation: String.t()
}

Functions

test_completeness(attributions, instance, predict_fn, opts \\ [])

@spec test_completeness(map(), list(), (any() -> any()), keyword()) ::
  completeness_test_result()

Test completeness (efficiency) axiom.

For SHAP: Σφᵢ should equal f(x) - E[f(x)] (baseline prediction) For Integrated Gradients: Σφᵢ should equal f(x) - f(baseline)

Parameters

  • attributions - Attribution map (feature_index => value)
  • instance - Instance that was explained
  • predict_fn - Prediction function
  • opts - Options:
    • :method - :shap, :integrated_gradients, :other
    • :baseline - Baseline instance(s) or value
    • :tolerance - Acceptable error (default: 0.1)

Returns

Map with:

  • :satisfies_completeness - Boolean
  • :attribution_sum - Actual sum of attributions
  • :expected_sum - Expected sum (f(x) - baseline)
  • :error - Absolute difference
  • :relative_error - |error| / |expected|
  • :interpretation - Assessment string

Examples

# SHAP should satisfy completeness
attributions = %{0 => 2.0, 1 => 3.0}
instance = [5.0, 10.0]
baseline = [[0.0, 0.0], [1.0, 1.0]]

result = Axioms.test_completeness(
  attributions,
  instance,
  predict_fn,
  method: :shap,
  baseline: baseline
)
# => %{satisfies_completeness: true, error: 0.01, ...}

test_dummy(attributions, instance, predict_fn, opts \\ [])

@spec test_dummy(map(), list(), (any() -> any()), keyword()) :: dummy_result()

Test dummy (null player) axiom.

Features that don't affect model output should have zero attribution.

Algorithm

  1. For each feature i: a. Vary feature i while fixing others b. Measure prediction change
  2. If Δf ≈ 0 for all variations, then φᵢ should ≈ 0

Parameters

  • attributions - Attribution map
  • instance - Instance explained
  • predict_fn - Prediction function
  • opts - Options:
    • :num_variations - Variations to test per feature (default: 10)
    • :tolerance - Attribution tolerance for dummy features (default: 0.1)
    • :prediction_tolerance - Prediction change tolerance (default: 0.01)

Returns

Map with:

  • :satisfies_dummy - Boolean
  • :dummy_features - Features identified as dummy
  • :violations - Dummy features with non-zero attribution
  • :interpretation - Assessment string

Examples

# Feature 2 is a dummy (doesn't affect prediction)
predict_fn = fn [x, y, _z] -> 2.0 * x + 3.0 * y end
attributions = %{0 => 2.0, 1 => 3.0, 2 => 0.0}

result = Axioms.test_dummy(attributions, [5.0, 10.0, 7.0], predict_fn)
# => %{satisfies_dummy: true, dummy_features: [2], ...}

test_linearity(shap_values, instance, model_coefficients, opts \\ [])

@spec test_linearity(map(), list(), map(), keyword()) :: linearity_test_result()

Test linearity axiom (for linear models only).

For linear model f(x) = wᵀx + b: SHAP values should exactly equal: φᵢ = wᵢ(xᵢ - E[xᵢ])

Parameters

  • shap_values - SHAP attribution map
  • instance - Instance explained
  • model_coefficients - Map of feature_index => weight
  • opts - Options:
    • :baseline - Baseline values for features (E[x])
    • :tolerance - Acceptable error (default: 0.1)

Returns

Map with:

  • :satisfies_linearity - Boolean
  • :errors_by_feature - Error for each feature
  • :max_error - Maximum observed error
  • :interpretation - Assessment string

Examples

# Linear model: f(x) = 2x₁ + 3x₂
coefficients = %{0 => 2.0, 1 => 3.0}
instance = [5.0, 10.0]
baseline = [0.0, 0.0]  # E[x]
shap_values = %{0 => 10.0, 1 => 30.0}  # 2*(5-0), 3*(10-0)

result = Axioms.test_linearity(
  shap_values,
  instance,
  coefficients,
  baseline: baseline
)
# => %{satisfies_linearity: true, ...}

test_symmetry(attributions, instance, predict_fn, opts \\ [])

@spec test_symmetry(map(), list(), (any() -> any()), keyword()) ::
  symmetry_test_result()

Test symmetry axiom.

Features with identical marginal contributions should receive identical attributions. This is difficult to test in general, so we use a heuristic approach for symmetric features.

Parameters

  • attributions - SHAP values or attributions
  • instance - Instance explained
  • predict_fn - Prediction function
  • opts - Options:
    • :symmetric_pairs - List of {idx1, idx2} feature pairs to test
    • :tolerance - Acceptable difference (default: 0.1)

Returns

Map with:

  • :satisfies_symmetry - Boolean
  • :violations - List of feature pairs that violate symmetry
  • :max_violation - Maximum observed violation

Examples

# Test two features known to be symmetric
attributions = %{0 => 2.0, 1 => 2.0}

result = Axioms.test_symmetry(
  attributions,
  instance,
  predict_fn,
  symmetric_pairs: [{0, 1}]
)
# => %{satisfies_symmetry: true, violations: [], ...}

validate_all_axioms(attributions, instance, predict_fn, opts \\ [])

@spec validate_all_axioms(map(), list(), (any() -> any()), keyword()) ::
  axioms_result()

Comprehensive axiom validation suite.

Runs all applicable axiom tests for the given method and returns a complete validation report.

Parameters

  • attributions - Attribution map
  • instance - Instance explained
  • predict_fn - Prediction function
  • opts - Options:
    • :method - Method type (:shap, :integrated_gradients, :lime, etc.)
    • :baseline - Baseline for completeness test
    • :model_coefficients - For linearity test (optional)
    • :symmetric_pairs - For symmetry test (optional)

Returns

Map with:

  • :completeness - Completeness test results
  • :symmetry - Symmetry test results (if applicable)
  • :dummy - Dummy test results
  • :linearity - Linearity test results (if applicable)
  • :all_satisfied - Whether all applicable axioms are satisfied
  • :overall_score - 0-1 score (fraction of axioms satisfied)
  • :summary - Human-readable summary

Examples

result = Axioms.validate_all_axioms(
  shap_values,
  instance,
  predict_fn,
  method: :shap,
  baseline: background
)

IO.puts(result.summary)