ML scoring pipeline: enrich -> featurize -> predict -> route by confidence.
Demonstrates session-scoped Python object references and the full Slither pipeline lifecycle:
- enrich (beam) -- checks the ETS feature cache for pre-computed
features; on a miss, packages raw data for Python featurization.
Attaches the
model_idfrom context metadata to every item. - featurize (python) -- extracts numeric features from raw data dicts, or passes through items that already have cached features.
- predict (python) -- runs batch prediction using a scikit-learn
model stored in the Python session. Session affinity ensures the
model trained by
train_modelis accessible. - route_by_confidence (router) -- splits results into
:high_confidence(>= 0.9) and:low_confidence(< 0.6) buckets; mid-range items go to:default.
Concurrent Session Isolation
The run_demo/0 function trains TWO models on TWO separate sessions
simultaneously, then scores test records through each session's
pipeline independently. This proves that:
- Session A's model is isolated to its worker process
- Session B's model is isolated to its worker process
- Predictions through session A use model A (not B)
- Predictions through session B use model B (not A)
Under free-threaded Python, a shared _models dict mutated from
multiple threads would lead to corruption -- one session could
silently overwrite another's model. Slither's process-per-session
design eliminates this by construction.
Requires scikit-learn and numpy. Run with:
Slither.Examples.MlScoring.ScoringPipe.run_demo()
Summary
Functions
Enrich a record with cached features or prepare it for featurization.
Run the ML scoring demo with concurrent session isolation.
Functions
Enrich a record with cached features or prepare it for featurization.
Checks the ETS feature cache for the record's ID. On a cache hit the pre-computed feature vector is used directly; on a miss the raw data map is passed through so the Python featurize stage can extract features. The model ID from context metadata is attached in both cases so the predict stage knows which model to use.
Run the ML scoring demo with concurrent session isolation.
Trains two logistic regression models on separate sessions with different data distributions, then scores test records through each session independently. Demonstrates that Slither's session affinity prevents cross-contamination of model state.