SnmpKit.SnmpLib.Dashboard (snmpkit v0.6.3)

Real-time monitoring dashboard and metrics aggregation for SNMP operations.

This module provides a comprehensive monitoring and visualization system for production SNMP deployments. Based on patterns proven in large-scale monitoring systems managing thousands of network devices.

Features

  • Real-Time Metrics: Live updates of performance and health metrics
  • Historical Analytics: Trend analysis and capacity planning data
  • Alert Management: Configurable thresholds and notification routing
  • Performance Insights: Detailed breakdown of operation performance
  • Device Health: Per-device status monitoring and diagnostics
  • Resource Utilization: Pool, memory, and system resource tracking

Metrics Categories

Performance Metrics

  • Request/response times (min, max, average, percentiles)
  • Throughput (operations per second)
  • Error rates and failure classifications
  • Connection pool utilization

Health Metrics

  • Device availability and reachability
  • Circuit breaker states
  • Retry counts and backoff status
  • Resource exhaustion indicators

System Metrics

  • Memory usage and garbage collection
  • Process counts and supervision tree health
  • Network socket utilization
  • Queue depths and processing delays

Dashboard Views

Overview Dashboard

Global health and performance summary with key indicators.

Device Dashboard

Per-device detailed metrics and troubleshooting information.

Pool Dashboard

Connection pool health, utilization, and performance metrics.

Alerts Dashboard

Active alerts, acknowledgments, and escalation status.

Usage Patterns

# Start the dashboard server
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(port: 4000)

# Record custom metrics
SnmpKit.SnmpLib.Dashboard.record_metric(:custom_operation, %{
  duration: 150,
  device: "192.168.1.1",
  status: :success
})

# Create custom alert
SnmpKit.SnmpLib.Dashboard.create_alert(:high_error_rate, %{
  device: "192.168.1.100",
  error_rate: 0.15,
  threshold: 0.10
})

# Export metrics for external systems
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()

Integration with External Systems

  • Prometheus: Native metrics export in Prometheus format
  • Grafana: Pre-built dashboards and alerting rules
  • PagerDuty: Alert escalation and incident management
  • Slack/Teams: Notification integration for team alerting

Summary

Functions

Acknowledges an alert to stop notifications.

Returns a specification to start this module under a supervisor.

Creates an alert for monitoring and notification systems.

Exports metrics in Prometheus format for external monitoring.

Gets all active alerts with optional filtering.

Gets detailed metrics for a specific device.

Gets current performance metrics summary.

Gets historical time series data for a metric.

Records a metric data point for monitoring and visualization.

Starts the dashboard server with monitoring and web interface.

Types

alert_level()

@type alert_level() :: :info | :warning | :critical

dashboard_opts()

@type dashboard_opts() :: [
  port: pos_integer(),
  update_interval: pos_integer(),
  retention_days: pos_integer(),
  prometheus_enabled: boolean(),
  grafana_integration: boolean()
]

metric_name()

@type metric_name() :: atom()

metric_tags()

@type metric_tags() :: map()

metric_value()

@type metric_value() :: number()

Functions

acknowledge_alert(alert_name, identifier)

@spec acknowledge_alert(atom(), any()) :: :ok

Acknowledges an alert to stop notifications.

Examples

:ok = SnmpKit.SnmpLib.Dashboard.acknowledge_alert(:device_unreachable, "192.168.1.1")

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

create_alert(alert_name, level, details \\ %{})

@spec create_alert(atom(), alert_level(), map()) :: :ok

Creates an alert for monitoring and notification systems.

Parameters

  • alert_name: Unique identifier for the alert type
  • level: Alert severity level (:info, :warning, :critical)
  • details: Alert metadata and context information

Examples

# Create device unreachable alert
SnmpKit.SnmpLib.Dashboard.create_alert(:device_unreachable, :critical, %{
  device: "192.168.1.1",
  last_seen: DateTime.utc_now(),
  consecutive_failures: 5
})

# Create performance degradation warning
SnmpKit.SnmpLib.Dashboard.create_alert(:slow_response, :warning, %{
  device: "192.168.1.1",
  avg_response_time: 5000,
  threshold: 2000
})

export_prometheus()

@spec export_prometheus() :: binary()

Exports metrics in Prometheus format for external monitoring.

Examples

prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()
File.write!("/tmp/snmp_metrics.prom", prometheus_data)

get_active_alerts(filters \\ [])

@spec get_active_alerts(keyword()) :: [map()]

Gets all active alerts with optional filtering.

Examples

all_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts()
critical_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts(level: :critical)

get_device_metrics(device_id)

@spec get_device_metrics(binary()) :: map()

Gets detailed metrics for a specific device.

Examples

device_metrics = SnmpKit.SnmpLib.Dashboard.get_device_metrics("192.168.1.1")
IO.inspect device_metrics.response_times

get_metrics_summary()

@spec get_metrics_summary() :: map()

Gets current performance metrics summary.

Returns

A map containing aggregated metrics:

  • total_operations: Total SNMP operations performed
  • success_rate: Percentage of successful operations
  • avg_response_time: Average response time in milliseconds
  • active_devices: Number of devices being monitored
  • pool_utilization: Connection pool usage percentage
  • error_rates: Breakdown of error types and frequencies

Examples

metrics = SnmpKit.SnmpLib.Dashboard.get_metrics_summary()
IO.puts "Success rate: " <> Float.to_string(metrics.success_rate * 100) <> "%"

get_timeseries(metric_name, duration \\ 3_600_000, tags \\ %{})

@spec get_timeseries(metric_name(), pos_integer(), map()) :: [map()]

Gets historical time series data for a metric.

Parameters

  • metric_name: Name of the metric to retrieve
  • duration: Time window in milliseconds (default: 1 hour)
  • tags: Optional tag filters

Examples

# Get last hour of response times
timeseries = SnmpKit.SnmpLib.Dashboard.get_timeseries(:snmp_response_time)

# Get last 24 hours for specific device
device_data = SnmpKit.SnmpLib.Dashboard.get_timeseries(
  :snmp_response_time,
  24 * 60 * 60 * 1000,
  %{device: "192.168.1.1"}
)

record_metric(metric_name, value, tags \\ %{})

@spec record_metric(metric_name(), metric_value(), metric_tags()) :: :ok

Records a metric data point for monitoring and visualization.

Parameters

  • metric_name: Unique identifier for the metric type
  • value: Numeric value for the metric
  • tags: Optional metadata for filtering and grouping

Examples

# Record response time metric
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_response_time, 125, %{
  device: "192.168.1.1",
  operation: "get",
  community: "public"
})

# Record error count
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_errors, 1, %{
  device: "192.168.1.1",
  error_type: "timeout"
})

# Record pool utilization
SnmpKit.SnmpLib.Dashboard.record_metric(:pool_utilization, 0.75, %{
  pool_name: "main_pool"
})

start_link(opts \\ [])

@spec start_link(dashboard_opts()) :: {:ok, pid()} | {:error, any()}

Starts the dashboard server with monitoring and web interface.

Options

  • port: Web dashboard port (default: 4000)
  • update_interval: Metrics update frequency in milliseconds (default: 5000)
  • retention_days: How long to keep historical data (default: 7)
  • prometheus_enabled: Enable Prometheus metrics endpoint (default: false)
  • grafana_integration: Enable Grafana dashboard integration (default: false)

Examples

# Start with defaults
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link()

# Start with custom configuration
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(
  port: 8080,
  prometheus_enabled: true,
  retention_days: 14
)