Snakepit Configuration Guide

This guide covers all configuration options for Snakepit, from simple single-pool setups to advanced multi-pool deployments with different worker profiles.

Configuration Formats
Global Options
Pool Configuration
Heartbeat Configuration
Logging Configuration
Python Runtime Configuration
Optional Features
Complete Configuration Example

Configuration Formats

Snakepit supports two configuration formats: legacy (single-pool) and multi-pool (v0.6+).

Simple (Legacy) Configuration

For backward compatibility with v0.5.x and single-pool deployments:

# config/config.exs
config :snakepit,
  pooling_enabled: true,
  adapter_module: Snakepit.Adapters.GRPCPython,
  pool_size: 100,
  pool_config: %{
    startup_batch_size: 8,
    startup_batch_delay_ms: 750,
    max_workers: 1000
  }

This format creates a single pool named :default with the specified settings.

Multi-Pool Configuration (v0.6+)

For advanced deployments with multiple pools, each with different profiles:

# config/config.exs
config :snakepit,
  pools: [
    %{
      name: :default,
      worker_profile: :process,
      pool_size: 100,
      adapter_module: Snakepit.Adapters.GRPCPython
    },
    %{
      name: :ml_inference,
      worker_profile: :thread,
      pool_size: 4,
      threads_per_worker: 16,
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"]
    }
  ]

This creates two pools: :default for general tasks and :ml_inference for CPU-bound ML workloads.

Global Options

These options apply to all pools or the Snakepit application as a whole.

Option	Type	Default	Description
`pooling_enabled`	`boolean()`	`false`	Enable or disable worker pooling. Set to `true` for normal operation.
`adapter_module`	`module()`	`nil`	Default adapter module for pools that do not specify one.
`pool_size`	`pos_integer()`	`System.schedulers_online() * 2`	Default pool size. Typically 2x CPU cores.
`capacity_strategy`	`:pool \| :profile \| :hybrid`	`:pool`	How worker capacity is managed across pools.
`pool_startup_timeout`	`pos_integer()`	`10000`	Maximum time (ms) to wait for a worker to start.
`pool_queue_timeout`	`pos_integer()`	`5000`	Maximum time (ms) a request waits in queue.
`pool_max_queue_size`	`pos_integer()`	`1000`	Maximum queued requests before rejecting new ones.
`grpc_port`	`pos_integer()`	`50051`	Port for the Elixir gRPC server (Python-to-Elixir calls).
`grpc_host`	`String.t()`	`"localhost"`	Host for gRPC connections.
`graceful_shutdown_timeout_ms`	`pos_integer()`	`6000`	Time (ms) to wait for Python to terminate gracefully before SIGKILL.

Capacity Strategies

Strategy	Description
`:pool`	Each pool manages its own capacity independently. Default and simplest option.
`:profile`	Workers of the same profile share capacity across pools.
`:hybrid`	Combination of pool and profile strategies for complex deployments.

Pool Configuration

Each pool can be configured independently with these options.

Required Fields

Option	Type	Description
`name`	`atom()`	Unique pool identifier. Use `:default` for the primary pool.

Profile Selection

Option	Type	Default	Description
`worker_profile`	`:process \| :thread`	`:process`	Worker execution model. See Worker Profiles Guide.

Common Pool Options

Option	Type	Default	Description
`pool_size`	`pos_integer()`	Global setting	Number of workers in this pool.
`adapter_module`	`module()`	Global setting	Adapter module for this pool.
`adapter_args`	`list(String.t())`	`[]`	CLI arguments passed to the Python server.
`adapter_env`	`list({String.t(), String.t()})`	`[]`	Environment variables for Python processes.
`adapter_spec`	`String.t()`	`nil`	Python adapter module path (e.g., `"myapp.adapters.MyAdapter"`).

Process Profile Options

These options apply when worker_profile: :process:

Option	Type	Default	Description
`startup_batch_size`	`pos_integer()`	`8`	Workers started per batch during pool initialization.
`startup_batch_delay_ms`	`non_neg_integer()`	`750`	Delay between startup batches (ms). Reduces system load during startup.

Thread Profile Options

These options apply when worker_profile: :thread:

Option	Type	Default	Description
`threads_per_worker`	`pos_integer()`	`10`	Thread pool size per Python process. Total capacity = `pool_size * threads_per_worker`.
`thread_safety_checks`	`boolean()`	`false`	Enable runtime thread safety validation. Useful for development.

Worker Lifecycle Options

Option	Type	Default	Description
`worker_ttl`	`:infinity \| {value, unit}`	`:infinity`	Maximum worker lifetime before recycling.
`worker_max_requests`	`:infinity \| pos_integer()`	`:infinity`	Maximum requests before recycling a worker.

TTL Units:

Unit	Example
`:seconds`	`{3600, :seconds}` - 1 hour
`:minutes`	`{60, :minutes}` - 1 hour
`:hours`	`{1, :hours}` - 1 hour

Worker recycling helps prevent memory leaks and ensures fresh worker state.

Heartbeat Configuration

Heartbeats detect unresponsive workers and trigger automatic restarts.

Global Heartbeat Config

config :snakepit,
  heartbeat: %{
    enabled: true,
    ping_interval_ms: 2000,
    timeout_ms: 10000,
    max_missed_heartbeats: 3,
    initial_delay_ms: 0,
    dependent: true
  }

Per-Pool Heartbeat Config

%{
  name: :ml_pool,
  heartbeat: %{
    enabled: true,
    ping_interval_ms: 10000,
    timeout_ms: 30000,
    max_missed_heartbeats: 2
  }
}

Heartbeat Options

Option	Type	Default	Description
`enabled`	`boolean()`	`true`	Enable heartbeat monitoring.
`ping_interval_ms`	`pos_integer()`	`2000`	Interval between heartbeat pings.
`timeout_ms`	`pos_integer()`	`10000`	Maximum time to wait for heartbeat response.
`max_missed_heartbeats`	`pos_integer()`	`3`	Missed heartbeats before declaring worker dead.
`initial_delay_ms`	`non_neg_integer()`	`0`	Delay before first heartbeat ping.
`dependent`	`boolean()`	`true`	Whether worker terminates if heartbeat monitor dies.

Tuning Guidelines

Fast detection: Lower ping_interval_ms and max_missed_heartbeats
Reduce overhead: Higher ping_interval_ms for stable workloads
Long operations: Increase timeout_ms if workers run long computations
ML workloads: Use ping_interval_ms: 10000 or higher since inference can block

Logging Configuration

Snakepit uses its own logger for internal operations.

Log Level

config :snakepit,
  log_level: :info  # :debug | :info | :warning | :error | :none

Level	Description
`:debug`	Verbose output including worker lifecycle, gRPC calls, heartbeats
`:info`	Normal operation messages
`:warning`	Potential issues that do not stop operation
`:error`	Errors that affect functionality
`:none`	Disable all Snakepit logging

Log Categories

Fine-grained control over logging categories:

config :snakepit,
  log_level: :info,
  log_categories: %{
    pool: :debug,      # Pool operations
    worker: :debug,    # Worker lifecycle
    heartbeat: :info,  # Heartbeat monitoring
    grpc: :warning     # gRPC communication
  }

Python-Side Logging

The Python bridge respects the SNAKEPIT_LOG_LEVEL environment variable:

%{
  name: :default,
  adapter_env: [{"SNAKEPIT_LOG_LEVEL", "info"}]
}

Python Runtime Configuration

Configure how Python interpreters are discovered and managed.

Interpreter Selection

config :snakepit,
  python_executable: "/path/to/python3"

Or use environment variable (takes precedence):

export SNAKEPIT_PYTHON="/path/to/python3"

Runtime Strategy

config :snakepit,
  python_runtime: %{
    strategy: :venv,  # :system | :venv | :managed
    managed: false,
    version: "3.12"
  }

Strategy	Description
`:system`	Use system Python interpreter
`:venv`	Use project virtual environment (`.venv/bin/python3`)
`:managed`	Let Snakepit manage Python version (experimental)

Environment Variables per Pool

%{
  name: :ml_pool,
  adapter_env: [
    # Control threading in numerical libraries
    {"OPENBLAS_NUM_THREADS", "1"},
    {"MKL_NUM_THREADS", "1"},
    {"OMP_NUM_THREADS", "1"},
    {"NUMEXPR_NUM_THREADS", "1"},

    # GPU configuration
    {"CUDA_VISIBLE_DEVICES", "0"},

    # Python settings
    {"PYTHONUNBUFFERED", "1"},
    {"SNAKEPIT_LOG_LEVEL", "warning"}
  ]
}

Optional Features

Zero-Copy Data Transfer

Enable zero-copy for large binary data:

config :snakepit,
  zero_copy: %{
    enabled: true,
    threshold_bytes: 1_048_576  # 1 MB
  }

Zero-copy is beneficial for ML workloads with large tensors.

Crash Barrier

Limit restart attempts for frequently crashing workers:

config :snakepit,
  crash_barrier: %{
    enabled: true,
    max_restarts: 5,
    window_seconds: 60
  }

If a worker restarts more than max_restarts times within window_seconds, it is permanently removed from the pool.

Circuit Breaker

Prevent cascading failures:

config :snakepit,
  circuit_breaker: %{
    enabled: true,
    failure_threshold: 5,
    reset_timeout_ms: 30000
  }

After failure_threshold consecutive failures, the circuit opens and requests fail fast for reset_timeout_ms.

Complete Configuration Example

Here is a production-ready configuration demonstrating all major options:

# config/config.exs
config :snakepit,
  # Global settings
  pooling_enabled: true,
  pool_startup_timeout: 30_000,
  pool_queue_timeout: 10_000,
  pool_max_queue_size: 5000,
  grpc_port: 50051,

  # Logging
  log_level: :info,
  log_categories: %{
    pool: :info,
    worker: :warning,
    heartbeat: :warning,
    grpc: :warning
  },

  # Global heartbeat defaults
  heartbeat: %{
    enabled: true,
    ping_interval_ms: 5000,
    timeout_ms: 15000,
    max_missed_heartbeats: 3
  },

  # Multiple pools
  pools: [
    # Default pool for I/O-bound tasks
    %{
      name: :default,
      worker_profile: :process,
      pool_size: 50,
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "myapp.adapters.GeneralAdapter"],
      adapter_env: [
        {"OPENBLAS_NUM_THREADS", "1"},
        {"OMP_NUM_THREADS", "1"}
      ],
      startup_batch_size: 10,
      startup_batch_delay_ms: 500
    },

    # ML inference pool (CPU-bound, thread profile)
    %{
      name: :ml_inference,
      worker_profile: :thread,
      pool_size: 4,
      threads_per_worker: 8,  # 32 total capacity
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"],
      adapter_env: [
        {"OPENBLAS_NUM_THREADS", "8"},
        {"OMP_NUM_THREADS", "8"},
        {"CUDA_VISIBLE_DEVICES", "0"},
        {"PYTORCH_CUDA_ALLOC_CONF", "max_split_size_mb:512"}
      ],
      thread_safety_checks: false,
      worker_ttl: {1800, :seconds},
      worker_max_requests: 10000,
      heartbeat: %{
        enabled: true,
        ping_interval_ms: 10000,
        timeout_ms: 60000,
        max_missed_heartbeats: 2
      }
    },

    # Background processing pool
    %{
      name: :background,
      worker_profile: :process,
      pool_size: 10,
      adapter_module: Snakepit.Adapters.GRPCPython,
      adapter_args: ["--adapter", "myapp.adapters.BackgroundAdapter"],
      adapter_env: [
        {"SNAKEPIT_LOG_LEVEL", "warning"}
      ],
      worker_ttl: {3600, :seconds}
    }
  ],

  # Optional features
  crash_barrier: %{
    enabled: true,
    max_restarts: 10,
    window_seconds: 300
  }

Environment-Specific Overrides

# config/prod.exs
config :snakepit,
  log_level: :warning,
  pool_max_queue_size: 10000

# config/dev.exs
config :snakepit,
  log_level: :debug,
  pool_size: 4

# config/test.exs
config :snakepit,
  pooling_enabled: false

Validation

Verify your configuration with the doctor task:

mix snakepit.doctor

At runtime, check pool status:

iex> Snakepit.get_stats()
%{
  requests: 15432,
  queued: 5,
  errors: 12,
  queue_timeouts: 3,
  pool_saturated: 0,
  workers: 54,
  available: 49,
  busy: 5
}

Getting Started - Installation and first steps
Worker Profiles - Process vs Thread profiles
Production - Performance tuning and deployment checklist

← Previous Page Getting Started

Next Page → Worker Profiles