Telemetry and Observability

Copy Markdown View Source

SnakeBridge emits :telemetry events for compilation, runtime calls, sessions, and documentation fetches. These events enable logging, metrics collection, and custom monitoring integrations.

Event Categories

Compile-time Events

Emitted during mix compile when generating wrapper modules:

EventMeasurementsMetadata
[:snakebridge, :compile, :start]system_timelibrary, phase, details
[:snakebridge, :compile, :stop]duration, symbols_generated, files_writtenlibrary, phase, details
[:snakebridge, :compile, :exception]durationlibrary, phase, details
[:snakebridge, :compile, :scan, :stop]duration, files_scanned, symbols_foundlibrary, phase, details
[:snakebridge, :compile, :introspect, :start]system_timelibrary, phase, details
[:snakebridge, :compile, :introspect, :stop]duration, symbols_introspected, cache_hitslibrary, phase, details
[:snakebridge, :compile, :generate, :stop]duration, bytes_written, functions_generated, classes_generatedlibrary, phase, details

Runtime Events

Forwarded from Snakepit via RuntimeForwarder (see below):

EventMeasurementsMetadata
[:snakebridge, :runtime, :call, :start]system_timelibrary, function, call_type, snakebridge_version
[:snakebridge, :runtime, :call, :stop]durationlibrary, function, call_type, snakebridge_version
[:snakebridge, :runtime, :call, :exception]durationlibrary, function, call_type, snakebridge_version

Session Events

EventMeasurementsMetadata
[:snakebridge, :session, :cleanup]system_timesession_id, source, reason
[:snakebridge, :session, :cleanup, :error]system_timesession_id, source, reason, error

The source is :manual or :owner_down. The reason provides the exit reason.

The :error event is emitted when best-effort session cleanup fails. The error field contains the failure reason (exception, exit, or throw). This helps identify Python runtime issues or timeout problems during cleanup.

Documentation Events

EventMeasurementsMetadata
[:snakebridge, :docs, :fetch]durationmodule, function, source
[:snakebridge, :lock, :verify]durationresult, warnings

Measurements and Metadata

All timed events include duration in native time units. Convert with:

duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)

Common metadata fields:

  • library - Target library (:all for full compile, specific atom for phases)
  • phase - One of :compile, :scan, :introspect, :generate
  • details - Phase-specific map with additional context

RuntimeForwarder

Bridges Snakepit's Python call events into the SnakeBridge namespace:

# In your application startup
SnakeBridge.Telemetry.RuntimeForwarder.attach()

Listens to [:snakepit, :python, :call, ...] and re-emits as [:snakebridge, :runtime, :call, ...] with added snakebridge_version.

Built-in Handlers

Logger Handler

SnakeBridge.Telemetry.Handlers.Logger.attach()

Log levels:

  • :info - Compile success
  • :error - Compile exception
  • :debug - Introspection and generation details

Example output:

[info] SnakeBridge compiled 42 symbols in 1234ms (3 libraries)
[debug] Introspected 15 symbols from numpy in 456ms (Python: 400ms, cache hits: 5)

Metrics Handler (Prometheus)

metrics = SnakeBridge.Telemetry.Handlers.Metrics.metrics()
TelemetryMetricsPrometheus.Core.attach(metrics)

Custom Handler Example

defmodule MyApp.SnakeBridgeHandler do
  @events [
    [:snakebridge, :compile, :stop],
    [:snakebridge, :runtime, :call, :stop]
  ]

  def attach do
    :telemetry.attach_many("my-handler", @events, &handle_event/4, %{})
  end

  def handle_event([:snakebridge, :compile, :stop], measurements, metadata, _config) do
    duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)
    libraries = metadata.details[:libraries] || []

    MyApp.Metrics.record(:snakebridge_compile, %{
      duration_ms: duration_ms,
      symbols: measurements.symbols_generated,
      libraries: length(libraries)
    })
  end

  def handle_event([:snakebridge, :runtime, :call, :stop], measurements, metadata, _config) do
    duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)

    MyApp.Metrics.histogram(:python_call_duration, duration_ms, %{
      library: metadata.library,
      function: metadata.function
    })
  end
end

Session Cleanup Error Handler

Monitor cleanup failures to detect Python runtime issues:

defmodule MyApp.CleanupMonitor do
  require Logger

  def attach do
    :telemetry.attach(
      "cleanup-error-handler",
      [:snakebridge, :session, :cleanup, :error],
      &handle_cleanup_error/4,
      %{}
    )
  end

  def handle_cleanup_error(_event, _measurements, metadata, _config) do
    Logger.warning(
      "Session cleanup failed",
      session_id: metadata.session_id,
      source: metadata.source,
      error: inspect(metadata.error)
    )

    # Alert on repeated failures
    MyApp.Alerting.increment(:session_cleanup_failures)
  end
end

Configuration

Session cleanup events can be logged at a configurable level:

# config/config.exs
config :snakebridge, session_cleanup_log_level: :debug

Metrics Definition Reference

Compilation Metrics

MetricTypeTags
snakebridge.compile.durationDistribution-
snakebridge.compile.symbols_generatedSum-
snakebridge.compile.totalCounter-

Scan Metrics

MetricTypeTags
snakebridge.scan.durationDistribution-
snakebridge.scan.files_scannedSum-
snakebridge.scan.symbols_foundSum-

Introspection Metrics

MetricTypeTags
snakebridge.introspect.durationDistributionlibrary
snakebridge.introspect.symbols_introspectedSumlibrary
snakebridge.introspect.cache_hitsSumlibrary

Generation Metrics

MetricTypeTags
snakebridge.generate.durationDistributionlibrary
snakebridge.generate.bytes_writtenSumlibrary

Documentation Metrics

MetricTypeTags
snakebridge.docs.fetch.durationDistributionsource
snakebridge.docs.fetch.totalCountersource

Distribution metrics use default buckets: [100, 500, 1000, 5000, 10_000] ms.

See Also