Snakepit Architecture

Overview

Snakepit is a high-performance Python bridge for Elixir that enables seamless execution of Python code from Elixir applications. It uses a pure gRPC-based architecture with stateless Python workers and centralized session management in Elixir.

High-Level Architecture


                          Elixir Application                         

                                                                     
             
      Pool       WorkerSupervisor  Worker.Starter   
    (GenServer)       (DynamicSupervisor)     (Supervisor)    
             
                                                                   
                                                                   
                                
                         SessionStore           GRPCWorker    
           (GenServer)           (GenServer)    
                          + ETS Table            
                                                 
                                                          gRPC      

                                                          

                        Python Worker                               
                                                                    
          
    grpc_server.py   SessionContext       Types &      
   (gRPC Service)        (Cache + Client)    Serialization   
          
                                                                    
                                                                    
                          
    User Adapter       User Tools                         
    (BaseAdapter)         (Custom Code)                       
                          

Key Components

Elixir Side

Pool (lib/snakepit/pool/pool.ex)

  • Purpose: Manages a pool of Python workers for concurrent execution
  • Design: GenServer that maintains available/busy worker sets and a request queue
  • Features:
    • Concurrent worker startup
    • Session affinity (routes session requests to same worker when possible)
    • Automatic request queueing when all workers are busy
    • Non-blocking async execution using Task.Supervisor

WorkerSupervisor (lib/snakepit/pool/worker_supervisor.ex)

  • Purpose: DynamicSupervisor for managing worker lifecycle
  • Design: Starts Worker.Starter processes which in turn manage actual workers
  • Features: Provides clean separation between supervision and worker logic

Worker.Starter (lib/snakepit/pool/worker_starter.ex)

  • Purpose: Implements the "Permanent Wrapper" pattern for automatic worker restarts
  • Design: A permanent supervisor that manages a transient worker
  • Features:
    • Automatic restart of crashed workers without Pool intervention
    • Clean shutdown during application termination
    • Decouples Pool from worker replacement logic

GRPCWorker (lib/snakepit/grpc_worker.ex)

  • Purpose: Manages a single Python worker process
  • Design: GenServer that spawns and communicates with Python via gRPC
  • Features:
    • Port-based process management
    • Health checking
    • Automatic reconnection
    • Request timeout handling
    • Statistics tracking

SessionStore (lib/snakepit/bridge/session_store.ex)

  • Purpose: Centralized session and variable management
  • Design: GenServer backed by ETS table for high-performance concurrent access
  • Features:
    • TTL-based session expiration
    • Type-safe variable storage
    • Atomic operations
    • High-performance cleanup using ETS select_delete
    • Read-concurrency optimization

Python Side

grpc_server.py (priv/python/grpc_server.py)

  • Purpose: Main gRPC service implementation
  • Design: Stateless server that delegates to adapters and manages sessions
  • Features:
    • Unified protocol supporting both simple execution and session-based operations
    • Tool registration and execution
    • Streaming support
    • Comprehensive error handling

SessionContext (priv/python/snakepit_bridge/session_context.py)

  • Purpose: Client-side session state with intelligent caching
  • Design: Thread-safe context manager with TTL-based cache
  • Features:
    • Local variable cache to reduce gRPC round-trips
    • Automatic cache invalidation on TTL expiry
    • Lazy loading of variables
    • Batch operations support

Types & Serialization

  • Purpose: Consistent type system across Elixir and Python
  • Design: Centralized modules for type conversion and validation
  • Features:
    • Support for basic types (integer, float, string, boolean)
    • Complex types (list, map, embedding, tensor)
    • Special value handling (NaN, Infinity)
    • Binary data optimization

Design Principles

1. Stateless Python Workers

Python workers are completely stateless. All persistent state is managed by the Elixir SessionStore. This enables:

  • Easy horizontal scaling
  • Crash resilience
  • Simple worker replacement
  • No state synchronization issues

2. Centralized State Management

The SessionStore in Elixir is the single source of truth for all session state:

  • Variables are stored with type information
  • Sessions have TTL-based expiration
  • All operations are atomic
  • High concurrency via ETS

3. Performance Optimization

  • ETS for Storage: Read-concurrency optimized ETS tables for session data
  • Client-side Caching: Python SessionContext caches variables locally
  • Batch Operations: Support for bulk variable operations
  • Binary Protocol: gRPC with protobuf for efficient serialization

4. Fault Tolerance

  • Supervision Tree: Proper OTP supervision at every level
  • Process Monitoring: Multiple layers of process monitoring
  • Automatic Cleanup: ApplicationCleanup prevents orphaned processes
  • Health Checks: Periodic health monitoring of workers

Protocol Specification

The system uses a unified gRPC protocol defined in priv/proto/snakepit_bridge.proto:

Core Services

  • Execute: Simple command execution
  • ExecuteStream: Streaming command execution
  • Tool Operations: RegisterTool, ListTools, UnregisterTool
  • Session Operations: InitializeSession, CleanupSession
  • Variable Operations: RegisterVariable, GetVariable, SetVariable, etc.
  • Health & Info: Health checks and system information

Message Flow

  1. Client calls Pool with request
  2. Pool assigns available worker (preferring session affinity)
  3. Request forwarded to GRPCWorker
  4. GRPCWorker makes gRPC call to Python
  5. Python executes via adapter/tools
  6. For variables: Python may call back to Elixir SessionStore
  7. Response returned through the chain

Architecture Evolution

As of v0.4.0, Snakepit uses a unified gRPC-only architecture that provides:

  • Stateless Python workers with centralized SessionStore for state management
  • Binary gRPC protocol with protobuf for efficient communication
  • Intelligent routing with session affinity and multi-level caching
  • Native streaming support for real-time progress updates
  • Bidirectional tool execution between Elixir and Python

Binary Serialization

Overview

The architecture includes automatic binary serialization for efficient handling of large numerical data:

  • Threshold-based: Automatically switches to binary encoding for data > 10KB
  • Type-aware: Optimized for tensor and embedding types
  • Transparent: No API changes required - works automatically
  • Protocol: Uses Erlang Term Format (ETF) on Elixir side, Python pickle on Python side

Implementation Details

  1. Detection: Serialization.should_use_binary?/2 checks data size
  2. Encoding:
    • Small data: JSON via encode_as_json/2
    • Large data: Binary via encode_with_binary/2
  3. Transport: Binary data travels in separate protobuf fields
  4. Decoding: Automatic detection of binary format via type URL suffix

Performance Impact

  • 10x faster serialization for large tensors
  • 5x reduction in message size
  • Zero overhead for small data (still uses JSON)

Future Enhancements

The architecture is designed to support future features:

  • Distributed Sessions: SessionStore could be backed by distributed ETS/Mnesia
  • Multi-node Support: Workers could run on different nodes
  • Advanced Caching: Redis-backed caching for large datasets
  • Metrics & Tracing: OpenTelemetry integration end-to-end
  • Tool Marketplace: Dynamic tool loading from external sources
  • Compression: Optional compression for binary data
  • Custom Serializers: Pluggable serialization formats