Snakepit Testing Guide

This guide covers the testing approach for the Snakepit project, including test organization, running tests, and understanding test output.

Test Overview

Snakepit contains comprehensive test coverage for:

Protocol foundation (gRPC infrastructure)
Core variable system with SessionStore
Type system and serialization
Bridge server implementation
Worker lifecycle hardening (port persistence, channel reuse, ETS/DETS protections, logging redaction)
Python integration

Running Tests

Before running any Elixir tests make sure the local toolchain is bootstrapped and healthy:

make bootstrap          # installs Mix deps, creates .venv/.venv-py313, regenerates gRPC stubs
mix snakepit.doctor     # verifies python executable, grpc import, health probe, and port availability

(mix snakepit.setup runs the same bootstrap sequence from inside Mix if make is unavailable.)

Basic Test Execution

# Run all fast tests (excludes :performance, :python_integration, :slow)
mix test

# Run tests with specific tags
mix test --exclude performance  # Default: excludes performance tests
mix test --only performance      # Run only performance tests

# Run Python-backed Elixir tests (requires make bootstrap + mix snakepit.doctor)
mix test --only python_integration

# Run slow integration suites (application restarts, OS process probes, TTL waits)
mix test --include slow
mix test --only slow

# Run specific test files
mix test test/snakepit/bridge/session_store_test.exs
mix test test/snakepit/streaming_regression_test.exs  # gRPC streaming regression
mix test test/unit/grpc/grpc_worker_ephemeral_port_test.exs           # Worker port persistence
mix test test/snakepit/grpc/bridge_server_test.exs                    # Channel reuse & parameter validation
mix test test/unit/pool/process_registry_security_test.exs            # DETS/ETS access control
mix test test/unit/logger/redaction_test.exs                          # Log redaction summaries

# Run Python pytest suites only (auto-manages .venv, installs/updates deps, regenerates protos)
./test_python.sh
./test_python.sh -k streaming  # Any args are forwarded to pytest
# The Python requirements now include pytest-asyncio so coroutine-based tests (e.g., heartbeat client)
# run without extra configuration.

Reliability Regression Targets (v0.6.6)

test/unit/grpc/grpc_worker_ephemeral_port_test.exs – verifies workers persist negotiated ports and survive pool shutdown races.
test/snakepit/grpc/bridge_server_test.exs – asserts BridgeServer reuses worker channels and surfaces invalid parameter errors.
test/unit/pool/process_registry_security_test.exs – prevents direct DETS writes and confirms registry APIs enforce visibility.
test/unit/logger/redaction_test.exs – exercises the redaction summaries that keep secrets out of logs.
test/unit/bridge/session_store_test.exs – covers per-session and global quotas plus safe reuse of existing program slots.

Fail-fast Experiment Suites

The following suites exercise the new failure-mode experiments described in AGENTS.md:

test/integration/orphan_cleanup_stress_test.exs – boots the pool with the Python adapter, issues load, crashes BEAM, and proves DETS + ProcessKiller remove stale workers across restarts.
test/performance/worker_crash_storm_test.exs – continuously kills workers while requests stream through the mock adapter and asserts the pool size, registry stats, and OS processes recover.
test/unit/config/startup_fail_fast_test.exs – verifies Snakepit.Application aborts early for missing executables, invalid pool profiles, and gRPC port binding conflicts.
test/snakepit/grpc/heartbeat_failfast_test.exs – covers dependent vs. independent heartbeat monitors using telemetry plus ProcessRegistry assertions.
test/snakepit/streaming_regression_test.exs – streaming cancellation regression to ensure workers are checked back into the pool and telemetry marks the request complete.
test/unit/pool/pool_queue_management_test.exs – includes a runtime saturation scenario driven by Snakepit.TestAdapters.QueueProbeAdapter to prove queue timeouts never execute.
test/snakepit/multi_pool_execution_test.exs – multi-pool isolation: broken pools stay quarantined while healthy pools keep serving.
test/snakepit/process_killer_test.exs – spawns fake Python processes to validate ProcessKiller.kill_by_run_id/1 only kills matching run IDs.

Test Modes

The test suite runs in different modes based on tags:

Default: Unit and integration tests (excludes :performance, :python_integration, and :slow)
Python Integration: Elixir ↔ Python flows (run with mix test --only python_integration)
Performance: Benchmarks and latency tests (use --only performance)
Slow Integration: Full application restarts, OS process cleanup verification, queue saturation, and TTL-dependent flows. These are tagged @tag :slow (or @moduletag :slow) and run with mix test --include slow when you need the exhaustive coverage.

Understanding Test Output

Expected Warnings

Some tests intentionally trigger warnings to verify error handling. These are expected and normal:

gRPC Server Shutdown Warning
```
⏰ gRPC server PID XXXXX did not exit gracefully within 500ms. Forcing SIGKILL.
```
This occurs during test cleanup when the gRPC server is forcefully terminated. It's expected behavior.
Server Configuration All gRPC server configuration warnings have been resolved in the current implementation.

Test Statistics

A typical successful test run shows:

Finished in 1.9 seconds (1.1s async, 0.7s sync)
182 tests, 0 failures

Test Organization

test/
├── unit/
│   ├── bridge/            # SessionStore quotas, ToolRegistry validation
│   ├── config/            # Mix config normalization helpers
│   ├── grpc/              # Worker port persistence, channel reuse, telemetry
│   ├── logger/            # Redaction utilities
│   ├── mix/               # Diagnose tasks and shell fallbacks
│   ├── pool/              # Supervisor lifecycle, registry hardening
│   └── worker_profile/    # Profile-specific control logic
├── snakepit/
│   ├── bridge/            # End-to-end bridge flows
│   ├── grpc/              # BridgeServer + heartbeat integration
│   ├── integration/       # Cross-language happy-paths
│   ├── pool/              # Session affinity + multipool integration
│   ├── telemetry/         # Metrics + tracing plumbing
│   └── worker_profile/    # Profile behaviour under real pools
├── performance/           # Optional perf benchmarks (:performance tag)
├── support/               # Shared helpers and fixtures
└── test_helper.exs        # Global test configuration

Key Test Categories

1. Protocol Tests

Tests the gRPC protocol implementation including:

Service definitions
Message serialization
RPC handlers

2. Type System Tests

Validates the type system including:

Type validation and constraints
Serialization/deserialization
Special value handling (infinity, NaN)

3. SessionStore Tests

Tests the core state management:

Session lifecycle
Variable CRUD operations
Batch operations
TTL and cleanup

4. Integration Tests

End-to-end tests covering:

Python-Elixir communication
Full request/response cycles
Error propagation

Writing Tests

Test Patterns

Use descriptive test names

test "handles special float values correctly" do
  # Test implementation
end

Group related tests with describe blocks

describe "batch operations" do
  test "get_variables returns all found variables" do
    # Test implementation
  end
end

Capture expected logs

{result, logs} = with_log(fn ->
  # Code that generates expected warnings
end)
assert logs =~ "Expected warning message"

Performance Tests

Performance tests are tagged and excluded by default:

@tag :performance
test "handles 1000 concurrent requests" do
  # Performance test implementation
end

Continuous Integration

The test suite is designed to run in CI environments:

All tests must pass before merging
Performance tests are run separately
Test coverage is monitored

Troubleshooting

Common Issues

Port Already in Use
- The gRPC server uses port 50051
- Ensure no other services are using this port
Python Dependencies
- Some integration tests require the Python bridge packages
- Create a virtualenv and install deps: python3 -m venv .venv && .venv/bin/pip install -r priv/python/requirements.txt
- Export the interpreter for Mix so workers reuse it: export SNAKEPIT_PYTHON="$PWD/.venv/bin/python3"
- Run bridge tests with the bundled modules on the path: PYTHONPATH=priv/python .venv/bin/pytest priv/python/tests -q
- make test wraps these steps; run it when debugging cross-language failures
Compilation Warnings
- Protocol buffer regeneration may be needed
- Run mix grpc.gen to regenerate Elixir bindings

Telemetry Verification Checklist

mix test – exercises the Elixir OpenTelemetry spans/metrics wiring (fails fast if the Python bridge cannot import OTEL SDK).
PYTHONPATH=priv/python .venv/bin/pytest priv/python/tests/test_telemetry.py -q – validates the Python span helpers and correlation filter.
curl http://localhost:9568/metrics – shows Prometheus metrics after enabling the reporter with config :snakepit, telemetry_metrics: %{prometheus: %{enabled: true}}.
Set SNAKEPIT_OTEL_ENDPOINT=http://collector:4318 (or SNAKEPIT_OTEL_CONSOLE=true) to watch trace exports when running end-to-end examples.

Main README - Project overview
Unified gRPC Bridge - Protocol details
Main README - Implementation status

← Previous Page Architecture Diagrams

Next Page → Test & Example Status