This guide covers all configuration options for Snakepit, from simple single-pool setups to advanced multi-pool deployments with different worker profiles.
Table of Contents
- Configuration Formats
- Global Options
- Pool Configuration
- Heartbeat Configuration
- Logging Configuration
- Python Runtime Configuration
- Optional Features
- Complete Configuration Example
Configuration Formats
Snakepit supports two configuration formats: legacy (single-pool) and multi-pool (v0.6+).
Simple (Legacy) Configuration
For backward compatibility with v0.5.x and single-pool deployments:
# config/config.exs
config :snakepit,
pooling_enabled: true,
adapter_module: Snakepit.Adapters.GRPCPython,
pool_size: 100,
pool_config: %{
startup_batch_size: 8,
startup_batch_delay_ms: 750,
max_workers: 1000
}This format creates a single pool named :default with the specified settings.
If both top-level :pool_size and pool_config.pool_size are set, Snakepit
uses the top-level :pool_size.
Multi-Pool Configuration (v0.6+)
For advanced deployments with multiple pools, each with different profiles:
# config/config.exs
config :snakepit,
pools: [
%{
name: :default,
worker_profile: :process,
pool_size: 100,
adapter_module: Snakepit.Adapters.GRPCPython
},
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 16,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"]
}
]This creates two pools: :default for general tasks and :ml_inference for CPU-bound ML workloads.
Global Options
These options apply to all pools or the Snakepit application as a whole.
| Option | Type | Default | Description |
|---|---|---|---|
pooling_enabled | boolean() | false | Enable or disable worker pooling. Set to true for normal operation. |
adapter_module | module() | nil | Default adapter module for pools that do not specify one (including adapter timeout fallback). |
pool_size | pos_integer() | System.schedulers_online() * 2 | Default pool size. Typically 2x CPU cores. |
capacity_strategy | :pool | :profile | :hybrid | :pool | How worker capacity is managed across pools. |
affinity | :hint | :strict_queue | :strict_fail_fast | :hint | Default session affinity mode for pools. |
pool_startup_timeout | pos_integer() | 10000 | Maximum time (ms) to wait for a worker to start. |
pool_queue_timeout | pos_integer() | 5000 | Maximum time (ms) a request waits in queue. |
pool_max_queue_size | pos_integer() | 1000 | Maximum queued requests before rejecting new ones. |
pool_reconcile_interval_ms | non_neg_integer() | 1000 | Interval (ms) for pool reconciliation to restore worker count (0 disables). |
pool_reconcile_batch_size | pos_integer() | 2 | Max workers respawned per reconciliation tick (ignored if reconcile disabled). |
grpc_worker_health_check_timeout_ms | pos_integer() | 5000 | Timeout (ms) for periodic worker health-check RPCs. |
worker_starter_max_restarts | non_neg_integer() | 3 | Restart intensity: max restarts for worker starter supervisor. |
worker_starter_max_seconds | pos_integer() | 5 | Restart intensity window (seconds) for worker starter supervisor. |
worker_supervisor_max_restarts | non_neg_integer() | 3 | Restart intensity: max restarts for worker supervisor. |
worker_supervisor_max_seconds | pos_integer() | 5 | Restart intensity window (seconds) for worker supervisor. |
grpc_listener | map() | %{mode: :internal} | gRPC listener configuration (mode/host/port). |
grpc_internal_host | String.t() | "127.0.0.1" | Default host for internal-only gRPC listeners. |
grpc_port_pool_size | pos_integer() | 32 | Default pool size for :external_pool listeners. |
grpc_listener_ready_timeout_ms | pos_integer() | 5000 | Time (ms) to wait for gRPC listener to publish its port before pool startup. |
grpc_listener_port_check_interval_ms | pos_integer() | 25 | Interval (ms) between port readiness checks when reusing an existing listener. |
grpc_listener_reuse_attempts | pos_integer() | 3 | Number of attempts to reuse or rebind a listener before failing. |
grpc_listener_reuse_wait_timeout_ms | pos_integer() | 500 | Max wait (ms) for an already-started listener to publish its port before retrying. |
grpc_listener_reuse_retry_delay_ms | pos_integer() | 100 | Delay (ms) between listener reuse retries. |
instance_name | String.t() | nil | Instance identifier for isolating runtime state. |
instance_token | String.t() | nil | Unique per-running-instance token for strong process cleanup isolation. |
data_dir | String.t() | priv/data | Directory for runtime persistence (DETS, cleanup state). |
graceful_shutdown_timeout_ms | pos_integer() | 6000 | Time (ms) to wait for Python to terminate gracefully before SIGKILL. |
grpc_port and grpc_host remain supported for legacy configurations, but
new deployments should use grpc_listener.
gRPC Listener Modes
Internal-only mode binds to an ephemeral port and advertises localhost to workers:
config :snakepit,
grpc_listener: %{
mode: :internal
}External bindings require explicit host/port configuration:
config :snakepit,
grpc_listener: %{
mode: :external,
host: "localhost",
bind_host: "0.0.0.0",
port: 50051
}To run multiple instances on the same host, use pooled external ports:
config :snakepit,
grpc_listener: %{
mode: :external_pool,
host: "localhost",
bind_host: "0.0.0.0",
base_port: 50051,
pool_size: 32
}Use instance_name, instance_token, and data_dir to isolate registry state when sharing a deployment directory.
instance_name is for environment-level grouping (for example prod-us-east-1).
instance_token must be unique for each concurrently running VM (for example deploy slot, CI job, terminal session).
Without unique tokens, concurrent instances from the same codebase can treat each other as rogue/orphan processes during cleanup.
Example:
config :snakepit,
instance_name: "my-app",
instance_token: "job-1234",
data_dir: "/var/lib/snakepit"Capacity Strategies
| Strategy | Description |
|---|---|
:pool | Each pool manages its own capacity independently. Default and simplest option. |
:profile | Workers of the same profile share capacity across pools. |
:hybrid | Combination of pool and profile strategies for complex deployments. |
Pool Configuration
Each pool can be configured independently with these options.
Required Fields
| Option | Type | Description |
|---|---|---|
name | atom() | Unique pool identifier. Use :default for the primary pool. |
Profile Selection
| Option | Type | Default | Description |
|---|---|---|---|
worker_profile | :process | :thread | :process | Worker execution model. See Worker Profiles Guide. |
Common Pool Options
| Option | Type | Default | Description |
|---|---|---|---|
pool_size | pos_integer() | Global setting | Number of workers in this pool. |
adapter_module | module() | Global setting | Adapter module for this pool. |
adapter_args | list(String.t()) | [] | CLI arguments passed to the Python server. |
adapter_env | list({String.t(), String.t()}) | [] | Environment variables for Python processes. |
adapter_spec | String.t() | nil | Python adapter module path (e.g., "myapp.adapters.MyAdapter"). |
affinity | :hint | :strict_queue | :strict_fail_fast | Global setting | Session affinity behavior for this pool. |
When adapters implement command_timeout/2, Snakepit resolves command timeout
from the selected worker's pool adapter_module first. The global
config :snakepit, adapter_module: ... value is used only if a pool does not
declare its own adapter.
Session Affinity Modes
config :snakepit,
affinity: :hint # global default
config :snakepit,
pools: [
%{name: :default, affinity: :strict_queue, pool_size: 4}
]:hint(default) — Prefer the last worker if available; otherwise fall back.:strict_queue— Queue when the preferred worker is busy; guarantees same-worker routing but can increase latency and queue timeouts.:strict_fail_fast— Return{:error, %Snakepit.Error{category: :pool, details: %{reason: :worker_busy}}}when the preferred worker is busy.
If the preferred worker is tainted or missing, strict modes return
{:error, %Snakepit.Error{category: :pool, details: %{reason: :session_worker_unavailable}}}.
Process Profile Options
These options apply when worker_profile: :process:
| Option | Type | Default | Description |
|---|---|---|---|
startup_batch_size | pos_integer() | 8 | Workers started per batch during pool initialization. |
startup_batch_delay_ms | non_neg_integer() | 750 | Delay between startup batches (ms). Reduces system load during startup. |
Thread Profile Options
These options apply when worker_profile: :thread:
| Option | Type | Default | Description |
|---|---|---|---|
threads_per_worker | pos_integer() | 10 | Thread pool size per Python process. Total capacity = pool_size * threads_per_worker. |
thread_safety_checks | boolean() | false | Enable runtime thread safety validation. Useful for development. |
Worker Lifecycle Options
| Option | Type | Default | Description |
|---|---|---|---|
worker_ttl | :infinity | {value, unit} | :infinity | Maximum worker lifetime before recycling. |
worker_max_requests | :infinity | pos_integer() | :infinity | Maximum requests before recycling a worker. |
TTL Units:
| Unit | Example |
|---|---|
:seconds | {3600, :seconds} - 1 hour |
:minutes | {60, :minutes} - 1 hour |
:hours | {1, :hours} - 1 hour |
Worker recycling helps prevent memory leaks and ensures fresh worker state.
Heartbeat Configuration
Heartbeats detect unresponsive workers and trigger automatic restarts.
Global Heartbeat Config
config :snakepit,
heartbeat: %{
enabled: true,
ping_interval_ms: 2000,
timeout_ms: 10000,
max_missed_heartbeats: 3,
initial_delay_ms: 0,
dependent: true
}Per-Pool Heartbeat Config
%{
name: :ml_pool,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 30000,
max_missed_heartbeats: 2
}
}Heartbeat Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean() | true | Enable heartbeat monitoring. |
ping_interval_ms | pos_integer() | 2000 | Interval between heartbeat pings. |
timeout_ms | pos_integer() | 10000 | Maximum time to wait for heartbeat response. |
max_missed_heartbeats | pos_integer() | 3 | Missed heartbeats before declaring worker dead. |
initial_delay_ms | non_neg_integer() | 0 | Delay before first heartbeat ping. |
dependent | boolean() | true | Whether worker terminates if heartbeat monitor dies. |
Tuning Guidelines
- Fast detection: Lower
ping_interval_msandmax_missed_heartbeats - Reduce overhead: Higher
ping_interval_msfor stable workloads - Long operations: Increase
timeout_msif workers run long computations - ML workloads: Use
ping_interval_ms: 10000or higher since inference can block
Logging Configuration
Snakepit uses its own logger for internal operations.
Log Level
config :snakepit,
log_level: :info # :debug | :info | :warning | :error | :none| Level | Description |
|---|---|
:debug | Verbose output including worker lifecycle, gRPC calls, heartbeats |
:info | Normal operation messages |
:warning | Potential issues that do not stop operation |
:error | Errors that affect functionality |
:none | Disable all Snakepit logging |
Log Categories
Fine-grained control over logging categories:
config :snakepit,
log_level: :info,
log_categories: %{
pool: :debug, # Pool operations
worker: :debug, # Worker lifecycle
heartbeat: :info, # Heartbeat monitoring
grpc: :warning # gRPC communication
}Python-Side Logging
The Python bridge respects the SNAKEPIT_LOG_LEVEL environment variable:
%{
name: :default,
adapter_env: [{"SNAKEPIT_LOG_LEVEL", "info"}]
}Python Runtime Configuration
Configure how Python interpreters are discovered and managed.
Interpreter Selection
config :snakepit,
python_executable: "/path/to/python3"Or use environment variable (takes precedence):
export SNAKEPIT_PYTHON="/path/to/python3"
Runtime Strategy
config :snakepit,
python_runtime: %{
strategy: :venv, # :system | :venv | :managed
managed: false,
version: "3.12"
}| Strategy | Description |
|---|---|
:system | Use system Python interpreter |
:venv | Use project virtual environment (.venv/bin/python3) |
:managed | Let Snakepit manage Python version (experimental) |
Environment Variables per Pool
%{
name: :ml_pool,
adapter_env: [
# Control threading in numerical libraries
{"OPENBLAS_NUM_THREADS", "1"},
{"MKL_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"},
{"NUMEXPR_NUM_THREADS", "1"},
# GPU configuration
{"CUDA_VISIBLE_DEVICES", "0"},
# Python settings
{"PYTHONUNBUFFERED", "1"},
{"SNAKEPIT_LOG_LEVEL", "warning"}
]
}Optional Features
Zero-Copy Data Transfer
Enable zero-copy for large binary data:
config :snakepit,
zero_copy: %{
enabled: true,
threshold_bytes: 1_048_576 # 1 MB
}Zero-copy is beneficial for ML workloads with large tensors.
Crash Barrier
Limit restart attempts for frequently crashing workers:
config :snakepit,
crash_barrier: %{
enabled: true,
max_restarts: 5,
window_seconds: 60
}If a worker restarts more than max_restarts times within window_seconds, it is permanently removed from the pool.
Circuit Breaker
Prevent cascading failures:
config :snakepit,
circuit_breaker: %{
enabled: true,
failure_threshold: 5,
reset_timeout_ms: 30000
}After failure_threshold consecutive failures, the circuit opens and requests fail fast for reset_timeout_ms.
Rogue Cleanup
Control startup orphan-process cleanup:
config :snakepit, :rogue_cleanup, enabled: false
# equivalent map form:
config :snakepit, rogue_cleanup: %{enabled: false}enabled: false is treated as an explicit disable and is not replaced by defaults.
Complete Configuration Example
Here is a production-ready configuration demonstrating all major options:
# config/config.exs
config :snakepit,
# Global settings
pooling_enabled: true,
pool_startup_timeout: 30_000,
pool_queue_timeout: 10_000,
pool_max_queue_size: 5000,
grpc_listener: %{
mode: :external,
host: "snakepit.internal",
bind_host: "0.0.0.0",
port: 50051
},
# Logging
log_level: :info,
log_categories: %{
pool: :info,
worker: :warning,
heartbeat: :warning,
grpc: :warning
},
# Global heartbeat defaults
heartbeat: %{
enabled: true,
ping_interval_ms: 5000,
timeout_ms: 15000,
max_missed_heartbeats: 3
},
# Multiple pools
pools: [
# Default pool for I/O-bound tasks
%{
name: :default,
worker_profile: :process,
pool_size: 50,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.GeneralAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"}
],
startup_batch_size: 10,
startup_batch_delay_ms: 500
},
# ML inference pool (CPU-bound, thread profile)
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 8, # 32 total capacity
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "8"},
{"OMP_NUM_THREADS", "8"},
{"CUDA_VISIBLE_DEVICES", "0"},
{"PYTORCH_CUDA_ALLOC_CONF", "max_split_size_mb:512"}
],
thread_safety_checks: false,
worker_ttl: {1800, :seconds},
worker_max_requests: 10000,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 60000,
max_missed_heartbeats: 2
}
},
# Background processing pool
%{
name: :background,
worker_profile: :process,
pool_size: 10,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.BackgroundAdapter"],
adapter_env: [
{"SNAKEPIT_LOG_LEVEL", "warning"}
],
worker_ttl: {3600, :seconds}
}
],
# Optional features
crash_barrier: %{
enabled: true,
max_restarts: 10,
window_seconds: 300
}Environment-Specific Overrides
# config/prod.exs
config :snakepit,
log_level: :warning,
pool_max_queue_size: 10000
# config/dev.exs
config :snakepit,
log_level: :debug,
pool_size: 4
# config/test.exs
config :snakepit,
pooling_enabled: falseValidation
Verify your configuration with the doctor task:
mix snakepit.doctor
At runtime, check pool status:
iex> Snakepit.get_stats()
%{
requests: 15432,
queued: 5,
errors: 12,
queue_timeouts: 3,
pool_saturated: 0,
workers: 54,
available: 49,
busy: 5
}Related Guides
- Getting Started - Installation and first steps
- Worker Profiles - Process vs Thread profiles
- Production - Performance tuning and deployment checklist