Snakepit.Worker.LifecycleManager (Snakepit v0.6.10)

View Source

Worker lifecycle manager for automatic recycling and health monitoring.

Manages worker lifecycle events:

  • TTL-based recycling: Recycle workers after configured time
  • Request-count recycling: Recycle after N requests
  • Memory monitoring: Recycle when the BEAM worker process exceeds a configurable threshold (optional)
  • Health checks: Monitor worker health and restart if needed

Why Worker Recycling?

Long-running Python processes can accumulate memory due to:

  • Memory fragmentation
  • Cache growth
  • Subtle memory leaks in C libraries
  • ML model weight accumulation

Automatic recycling prevents these issues from impacting production. The current implementation samples the BEAM Snakepit.GRPCWorker process memory via :get_memory_usage; Python child process memory is not yet measured directly.

Configuration

config :snakepit,
  pools: [
    %{
      name: :hpc_pool,
      worker_profile: :thread,
      worker_ttl: {3600, :seconds},      # Recycle hourly
      worker_max_requests: 1000,          # Or after 1000 requests
      memory_threshold_mb: 2048           # Or at 2GB (optional)
    }
  ]

Usage

The LifecycleManager runs automatically when started in the supervision tree. It monitors all workers across all pools.

# Manual worker recycling
Snakepit.Worker.LifecycleManager.recycle_worker(pool_name, worker_id)

# Get lifecycle statistics
Snakepit.Worker.LifecycleManager.get_stats()

Implementation

  • Runs periodic health checks (every 60 seconds)
  • Tracks worker metadata (start time, request count)
  • Gracefully replaces workers when recycling
  • Emits telemetry events for monitoring

Summary

Functions

Returns a specification to start this module under a supervisor.

Get lifecycle statistics.

Increment request count for a worker.

Returns a map of pools to the number of memory-threshold-based recycles observed since the lifecycle manager started.

Manually recycle a worker.

Start the lifecycle manager.

Track a worker for lifecycle management.

Untrack a worker (called when worker stops).

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

get_stats()

Get lifecycle statistics.

increment_request_count(worker_id)

Increment request count for a worker.

Called after each successful request.

memory_recycle_counts()

Returns a map of pools to the number of memory-threshold-based recycles observed since the lifecycle manager started.

recycle_worker(pool_name, worker_id)

Manually recycle a worker.

start_link(opts \\ [])

Start the lifecycle manager.

track_worker(pool_name, worker_id, worker_pid, config)

Track a worker for lifecycle management.

Called automatically when workers start.

untrack_worker(worker_id)

Untrack a worker (called when worker stops).