Slither.Examples.MlScoring.ScoringPipe (Slither v0.1.0)

Copy Markdown View Source

ML scoring pipeline: enrich -> featurize -> predict -> route by confidence.

Demonstrates session-scoped Python object references and the full Slither pipeline lifecycle:

  1. enrich (beam) -- checks the ETS feature cache for pre-computed features; on a miss, packages raw data for Python featurization. Attaches the model_id from context metadata to every item.
  2. featurize (python) -- extracts numeric features from raw data dicts, or passes through items that already have cached features.
  3. predict (python) -- runs batch prediction using a scikit-learn model stored in the Python session. Session affinity ensures the model trained by train_model is accessible.
  4. route_by_confidence (router) -- splits results into :high_confidence (>= 0.9) and :low_confidence (< 0.6) buckets; mid-range items go to :default.

Concurrent Session Isolation

The run_demo/0 function trains TWO models on TWO separate sessions simultaneously, then scores test records through each session's pipeline independently. This proves that:

  • Session A's model is isolated to its worker process
  • Session B's model is isolated to its worker process
  • Predictions through session A use model A (not B)
  • Predictions through session B use model B (not A)

Under free-threaded Python, a shared _models dict mutated from multiple threads would lead to corruption -- one session could silently overwrite another's model. Slither's process-per-session design eliminates this by construction.

Requires scikit-learn and numpy. Run with:

Slither.Examples.MlScoring.ScoringPipe.run_demo()

Summary

Functions

Enrich a record with cached features or prepare it for featurization.

Run the ML scoring demo with concurrent session isolation.

Functions

enrich(item, ctx)

Enrich a record with cached features or prepare it for featurization.

Checks the ETS feature cache for the record's ID. On a cache hit the pre-computed feature vector is used directly; on a miss the raw data map is passed through so the Python featurize stage can extract features. The model ID from context metadata is attached in both cases so the predict stage knows which model to use.

run_demo()

Run the ML scoring demo with concurrent session isolation.

Trains two logistic regression models on separate sessions with different data distributions, then scores test records through each session independently. Demonstrates that Slither's session affinity prevents cross-contamination of model state.