Snakepit Configuration Guide
View SourceThis guide covers all configuration options for Snakepit, from simple single-pool setups to advanced multi-pool deployments with different worker profiles.
Table of Contents
- Configuration Formats
- Global Options
- Pool Configuration
- Heartbeat Configuration
- Logging Configuration
- Python Runtime Configuration
- Optional Features
- Complete Configuration Example
Configuration Formats
Snakepit supports two configuration formats: legacy (single-pool) and multi-pool (v0.6+).
Simple (Legacy) Configuration
For backward compatibility with v0.5.x and single-pool deployments:
# config/config.exs
config :snakepit,
pooling_enabled: true,
adapter_module: Snakepit.Adapters.GRPCPython,
pool_size: 100,
pool_config: %{
startup_batch_size: 8,
startup_batch_delay_ms: 750,
max_workers: 1000
}This format creates a single pool named :default with the specified settings.
Multi-Pool Configuration (v0.6+)
For advanced deployments with multiple pools, each with different profiles:
# config/config.exs
config :snakepit,
pools: [
%{
name: :default,
worker_profile: :process,
pool_size: 100,
adapter_module: Snakepit.Adapters.GRPCPython
},
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 16,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"]
}
]This creates two pools: :default for general tasks and :ml_inference for CPU-bound ML workloads.
Global Options
These options apply to all pools or the Snakepit application as a whole.
| Option | Type | Default | Description |
|---|---|---|---|
pooling_enabled | boolean() | false | Enable or disable worker pooling. Set to true for normal operation. |
adapter_module | module() | nil | Default adapter module for pools that do not specify one. |
pool_size | pos_integer() | System.schedulers_online() * 2 | Default pool size. Typically 2x CPU cores. |
capacity_strategy | :pool | :profile | :hybrid | :pool | How worker capacity is managed across pools. |
pool_startup_timeout | pos_integer() | 10000 | Maximum time (ms) to wait for a worker to start. |
pool_queue_timeout | pos_integer() | 5000 | Maximum time (ms) a request waits in queue. |
pool_max_queue_size | pos_integer() | 1000 | Maximum queued requests before rejecting new ones. |
grpc_port | pos_integer() | 50051 | Port for the Elixir gRPC server (Python-to-Elixir calls). |
grpc_host | String.t() | "localhost" | Host for gRPC connections. |
graceful_shutdown_timeout_ms | pos_integer() | 6000 | Time (ms) to wait for Python to terminate gracefully before SIGKILL. |
Capacity Strategies
| Strategy | Description |
|---|---|
:pool | Each pool manages its own capacity independently. Default and simplest option. |
:profile | Workers of the same profile share capacity across pools. |
:hybrid | Combination of pool and profile strategies for complex deployments. |
Pool Configuration
Each pool can be configured independently with these options.
Required Fields
| Option | Type | Description |
|---|---|---|
name | atom() | Unique pool identifier. Use :default for the primary pool. |
Profile Selection
| Option | Type | Default | Description |
|---|---|---|---|
worker_profile | :process | :thread | :process | Worker execution model. See Worker Profiles Guide. |
Common Pool Options
| Option | Type | Default | Description |
|---|---|---|---|
pool_size | pos_integer() | Global setting | Number of workers in this pool. |
adapter_module | module() | Global setting | Adapter module for this pool. |
adapter_args | list(String.t()) | [] | CLI arguments passed to the Python server. |
adapter_env | list({String.t(), String.t()}) | [] | Environment variables for Python processes. |
adapter_spec | String.t() | nil | Python adapter module path (e.g., "myapp.adapters.MyAdapter"). |
Process Profile Options
These options apply when worker_profile: :process:
| Option | Type | Default | Description |
|---|---|---|---|
startup_batch_size | pos_integer() | 8 | Workers started per batch during pool initialization. |
startup_batch_delay_ms | non_neg_integer() | 750 | Delay between startup batches (ms). Reduces system load during startup. |
Thread Profile Options
These options apply when worker_profile: :thread:
| Option | Type | Default | Description |
|---|---|---|---|
threads_per_worker | pos_integer() | 10 | Thread pool size per Python process. Total capacity = pool_size * threads_per_worker. |
thread_safety_checks | boolean() | false | Enable runtime thread safety validation. Useful for development. |
Worker Lifecycle Options
| Option | Type | Default | Description |
|---|---|---|---|
worker_ttl | :infinity | {value, unit} | :infinity | Maximum worker lifetime before recycling. |
worker_max_requests | :infinity | pos_integer() | :infinity | Maximum requests before recycling a worker. |
TTL Units:
| Unit | Example |
|---|---|
:seconds | {3600, :seconds} - 1 hour |
:minutes | {60, :minutes} - 1 hour |
:hours | {1, :hours} - 1 hour |
Worker recycling helps prevent memory leaks and ensures fresh worker state.
Heartbeat Configuration
Heartbeats detect unresponsive workers and trigger automatic restarts.
Global Heartbeat Config
config :snakepit,
heartbeat: %{
enabled: true,
ping_interval_ms: 2000,
timeout_ms: 10000,
max_missed_heartbeats: 3,
initial_delay_ms: 0,
dependent: true
}Per-Pool Heartbeat Config
%{
name: :ml_pool,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 30000,
max_missed_heartbeats: 2
}
}Heartbeat Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean() | true | Enable heartbeat monitoring. |
ping_interval_ms | pos_integer() | 2000 | Interval between heartbeat pings. |
timeout_ms | pos_integer() | 10000 | Maximum time to wait for heartbeat response. |
max_missed_heartbeats | pos_integer() | 3 | Missed heartbeats before declaring worker dead. |
initial_delay_ms | non_neg_integer() | 0 | Delay before first heartbeat ping. |
dependent | boolean() | true | Whether worker terminates if heartbeat monitor dies. |
Tuning Guidelines
- Fast detection: Lower
ping_interval_msandmax_missed_heartbeats - Reduce overhead: Higher
ping_interval_msfor stable workloads - Long operations: Increase
timeout_msif workers run long computations - ML workloads: Use
ping_interval_ms: 10000or higher since inference can block
Logging Configuration
Snakepit uses its own logger for internal operations.
Log Level
config :snakepit,
log_level: :info # :debug | :info | :warning | :error | :none| Level | Description |
|---|---|
:debug | Verbose output including worker lifecycle, gRPC calls, heartbeats |
:info | Normal operation messages |
:warning | Potential issues that do not stop operation |
:error | Errors that affect functionality |
:none | Disable all Snakepit logging |
Log Categories
Fine-grained control over logging categories:
config :snakepit,
log_level: :info,
log_categories: %{
pool: :debug, # Pool operations
worker: :debug, # Worker lifecycle
heartbeat: :info, # Heartbeat monitoring
grpc: :warning # gRPC communication
}Python-Side Logging
The Python bridge respects the SNAKEPIT_LOG_LEVEL environment variable:
%{
name: :default,
adapter_env: [{"SNAKEPIT_LOG_LEVEL", "info"}]
}Python Runtime Configuration
Configure how Python interpreters are discovered and managed.
Interpreter Selection
config :snakepit,
python_executable: "/path/to/python3"Or use environment variable (takes precedence):
export SNAKEPIT_PYTHON="/path/to/python3"
Runtime Strategy
config :snakepit,
python_runtime: %{
strategy: :venv, # :system | :venv | :managed
managed: false,
version: "3.12"
}| Strategy | Description |
|---|---|
:system | Use system Python interpreter |
:venv | Use project virtual environment (.venv/bin/python3) |
:managed | Let Snakepit manage Python version (experimental) |
Environment Variables per Pool
%{
name: :ml_pool,
adapter_env: [
# Control threading in numerical libraries
{"OPENBLAS_NUM_THREADS", "1"},
{"MKL_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"},
{"NUMEXPR_NUM_THREADS", "1"},
# GPU configuration
{"CUDA_VISIBLE_DEVICES", "0"},
# Python settings
{"PYTHONUNBUFFERED", "1"},
{"SNAKEPIT_LOG_LEVEL", "warning"}
]
}Optional Features
Zero-Copy Data Transfer
Enable zero-copy for large binary data:
config :snakepit,
zero_copy: %{
enabled: true,
threshold_bytes: 1_048_576 # 1 MB
}Zero-copy is beneficial for ML workloads with large tensors.
Crash Barrier
Limit restart attempts for frequently crashing workers:
config :snakepit,
crash_barrier: %{
enabled: true,
max_restarts: 5,
window_seconds: 60
}If a worker restarts more than max_restarts times within window_seconds, it is permanently removed from the pool.
Circuit Breaker
Prevent cascading failures:
config :snakepit,
circuit_breaker: %{
enabled: true,
failure_threshold: 5,
reset_timeout_ms: 30000
}After failure_threshold consecutive failures, the circuit opens and requests fail fast for reset_timeout_ms.
Complete Configuration Example
Here is a production-ready configuration demonstrating all major options:
# config/config.exs
config :snakepit,
# Global settings
pooling_enabled: true,
pool_startup_timeout: 30_000,
pool_queue_timeout: 10_000,
pool_max_queue_size: 5000,
grpc_port: 50051,
# Logging
log_level: :info,
log_categories: %{
pool: :info,
worker: :warning,
heartbeat: :warning,
grpc: :warning
},
# Global heartbeat defaults
heartbeat: %{
enabled: true,
ping_interval_ms: 5000,
timeout_ms: 15000,
max_missed_heartbeats: 3
},
# Multiple pools
pools: [
# Default pool for I/O-bound tasks
%{
name: :default,
worker_profile: :process,
pool_size: 50,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.GeneralAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"}
],
startup_batch_size: 10,
startup_batch_delay_ms: 500
},
# ML inference pool (CPU-bound, thread profile)
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 8, # 32 total capacity
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "8"},
{"OMP_NUM_THREADS", "8"},
{"CUDA_VISIBLE_DEVICES", "0"},
{"PYTORCH_CUDA_ALLOC_CONF", "max_split_size_mb:512"}
],
thread_safety_checks: false,
worker_ttl: {1800, :seconds},
worker_max_requests: 10000,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 60000,
max_missed_heartbeats: 2
}
},
# Background processing pool
%{
name: :background,
worker_profile: :process,
pool_size: 10,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.BackgroundAdapter"],
adapter_env: [
{"SNAKEPIT_LOG_LEVEL", "warning"}
],
worker_ttl: {3600, :seconds}
}
],
# Optional features
crash_barrier: %{
enabled: true,
max_restarts: 10,
window_seconds: 300
}Environment-Specific Overrides
# config/prod.exs
config :snakepit,
log_level: :warning,
pool_max_queue_size: 10000
# config/dev.exs
config :snakepit,
log_level: :debug,
pool_size: 4
# config/test.exs
config :snakepit,
pooling_enabled: falseValidation
Verify your configuration with the doctor task:
mix snakepit.doctor
At runtime, check pool status:
iex> Snakepit.get_stats()
%{
requests: 15432,
queued: 5,
errors: 12,
queue_timeouts: 3,
pool_saturated: 0,
workers: 54,
available: 49,
busy: 5
}Related Guides
- Getting Started - Installation and first steps
- Worker Profiles - Process vs Thread profiles
- Production - Performance tuning and deployment checklist