ExDataCheck.Drift (ExDataCheck v0.2.1)
View SourceData drift detection for ML model monitoring.
Drift detection identifies when the distribution of production data differs significantly from training data, which can degrade model performance.
Drift Detection Methods
- Kolmogorov-Smirnov (KS): For continuous numerical features
- Chi-Square: For categorical features
- Population Stability Index (PSI): Industry-standard metric
Workflow
- Create baseline from training/reference data
- Detect drift in production/current data
- Monitor drift scores over time
- Retrain model when significant drift detected
Examples
# Create baseline from training data
baseline = ExDataCheck.Drift.create_baseline(training_data)
# Check production data for drift
drift_result = ExDataCheck.Drift.detect(production_data, baseline)
case drift_result do
%{drifted: true} = r ->
IO.puts("Drift detected in columns")
trigger_model_retraining()
_ ->
:ok
end
Summary
Functions
Creates a baseline distribution from a reference dataset.
Detects drift between current data and baseline.
Performs two-sample Kolmogorov-Smirnov test.
Calculates Population Stability Index (PSI) between two distributions.
Types
@type baseline() :: %{optional(atom() | String.t()) => baseline_column()}
Complete baseline for all columns.
@type baseline_column() :: map()
Baseline distribution for a column.
For numeric columns:
:type- :numeric:values- List of baseline values:mean- Baseline mean:stdev- Baseline standard deviation
For categorical columns:
:type- :categorical:frequencies- Map of value frequencies
Functions
Creates a baseline distribution from a reference dataset.
The baseline captures the distribution of each column for later comparison.
Parameters
dataset- Reference dataset (typically training data)
Returns
Map of column names to baseline statistics.
Examples
iex> dataset = [%{age: 25}, %{age: 30}, %{age: 35}]
iex> baseline = ExDataCheck.Drift.create_baseline(dataset)
iex> baseline[:age].type
:numeric
@spec detect([map()], baseline(), keyword()) :: ExDataCheck.DriftResult.t()
Detects drift between current data and baseline.
Compares the distribution of current data against the baseline and identifies columns that have drifted significantly.
Parameters
dataset- Current dataset to check for driftbaseline- Baseline created from reference dataopts- Options:threshold- Drift score threshold (default: 0.05):method- Detection method (:auto,:ks,:psi, default: :auto)
Returns
DriftResult struct with drift detection results.
Examples
iex> baseline = ExDataCheck.Drift.create_baseline(training_data)
iex> result = ExDataCheck.Drift.detect(production_data, baseline)
iex> result.drifted
false
Performs two-sample Kolmogorov-Smirnov test.
Tests whether two samples come from the same distribution.
Returns
Tuple of {ks_statistic, p_value}.
Examples
iex> dist1 = [1, 2, 3, 4, 5]
iex> dist2 = [1, 2, 3, 4, 5]
iex> {stat, p} = ExDataCheck.Drift.ks_test(dist1, dist2)
iex> stat
0.0
Calculates Population Stability Index (PSI) between two distributions.
PSI measures distribution shift, commonly used in credit scoring and ML monitoring.
PSI Interpretation:
- PSI < 0.1: No significant shift
- 0.1 <= PSI < 0.2: Moderate shift
- PSI >= 0.2: Significant shift
Parameters
baseline_dist- Map of categories to baseline proportionscurrent_dist- Map of categories to current proportions
Examples
iex> baseline = %{"A" => 0.5, "B" => 0.5}
iex> current = %{"A" => 0.5, "B" => 0.5}
iex> ExDataCheck.Drift.psi(baseline, current)
0.0