SnmpKit.SnmpLib.Dashboard (snmpkit v1.2.0)
Real-time monitoring dashboard and metrics aggregation for SNMP operations.
This module provides a comprehensive monitoring and visualization system for production SNMP deployments. Based on patterns proven in large-scale monitoring systems managing thousands of network devices.
Features
- Real-Time Metrics: Live updates of performance and health metrics
- Historical Analytics: Trend analysis and capacity planning data
- Alert Management: Configurable thresholds and notification routing
- Performance Insights: Detailed breakdown of operation performance
- Device Health: Per-device status monitoring and diagnostics
- Resource Utilization: Pool, memory, and system resource tracking
Metrics Categories
Performance Metrics
- Request/response times (min, max, average, percentiles)
- Throughput (operations per second)
- Error rates and failure classifications
- Connection pool utilization
Health Metrics
- Device availability and reachability
- Circuit breaker states
- Retry counts and backoff status
- Resource exhaustion indicators
System Metrics
- Memory usage and garbage collection
- Process counts and supervision tree health
- Network socket utilization
- Queue depths and processing delays
Dashboard Views
Overview Dashboard
Global health and performance summary with key indicators.
Device Dashboard
Per-device detailed metrics and troubleshooting information.
Pool Dashboard
Connection pool health, utilization, and performance metrics.
Alerts Dashboard
Active alerts, acknowledgments, and escalation status.
Usage Patterns
# Start the dashboard server
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(port: 4000)
# Record custom metrics
SnmpKit.SnmpLib.Dashboard.record_metric(:custom_operation, %{
  duration: 150,
  device: "192.168.1.1",
  status: :success
})
# Create custom alert
SnmpKit.SnmpLib.Dashboard.create_alert(:high_error_rate, %{
  device: "192.168.1.100",
  error_rate: 0.15,
  threshold: 0.10
})
# Export metrics for external systems
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()Integration with External Systems
- Prometheus: Native metrics export in Prometheus format
- Grafana: Pre-built dashboards and alerting rules
- PagerDuty: Alert escalation and incident management
- Slack/Teams: Notification integration for team alerting
Summary
Functions
Acknowledges an alert to stop notifications.
Returns a specification to start this module under a supervisor.
Creates an alert for monitoring and notification systems.
Exports metrics in Prometheus format for external monitoring.
Gets all active alerts with optional filtering.
Gets detailed metrics for a specific device.
Gets current performance metrics summary.
Gets historical time series data for a metric.
Records a metric data point for monitoring and visualization.
Starts the dashboard server with monitoring and web interface.
Types
@type alert_level() :: :info | :warning | :critical
      @type dashboard_opts() :: [ port: pos_integer(), update_interval: pos_integer(), retention_days: pos_integer(), prometheus_enabled: boolean(), grafana_integration: boolean() ]
@type metric_name() :: atom()
@type metric_tags() :: map()
@type metric_value() :: number()
Functions
Acknowledges an alert to stop notifications.
Examples
:ok = SnmpKit.SnmpLib.Dashboard.acknowledge_alert(:device_unreachable, "192.168.1.1")Returns a specification to start this module under a supervisor.
See Supervisor.
@spec create_alert(atom(), alert_level(), map()) :: :ok
Creates an alert for monitoring and notification systems.
Parameters
- alert_name: Unique identifier for the alert type
- level: Alert severity level (:info, :warning, :critical)
- details: Alert metadata and context information
Examples
# Create device unreachable alert
SnmpKit.SnmpLib.Dashboard.create_alert(:device_unreachable, :critical, %{
  device: "192.168.1.1",
  last_seen: DateTime.utc_now(),
  consecutive_failures: 5
})
# Create performance degradation warning
SnmpKit.SnmpLib.Dashboard.create_alert(:slow_response, :warning, %{
  device: "192.168.1.1",
  avg_response_time: 5000,
  threshold: 2000
})@spec export_prometheus() :: binary()
Exports metrics in Prometheus format for external monitoring.
Examples
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()
File.write!("/tmp/snmp_metrics.prom", prometheus_data)Gets all active alerts with optional filtering.
Examples
all_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts()
critical_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts(level: :critical)Gets detailed metrics for a specific device.
Examples
device_metrics = SnmpKit.SnmpLib.Dashboard.get_device_metrics("192.168.1.1")
IO.inspect device_metrics.response_times@spec get_metrics_summary() :: map()
Gets current performance metrics summary.
Returns
A map containing aggregated metrics:
- total_operations: Total SNMP operations performed
- success_rate: Percentage of successful operations
- avg_response_time: Average response time in milliseconds
- active_devices: Number of devices being monitored
- pool_utilization: Connection pool usage percentage
- error_rates: Breakdown of error types and frequencies
Examples
metrics = SnmpKit.SnmpLib.Dashboard.get_metrics_summary()
IO.puts "Success rate: " <> Float.to_string(metrics.success_rate * 100) <> "%"@spec get_timeseries(metric_name(), pos_integer(), map()) :: [map()]
Gets historical time series data for a metric.
Parameters
- metric_name: Name of the metric to retrieve
- duration: Time window in milliseconds (default: 1 hour)
- tags: Optional tag filters
Examples
# Get last hour of response times
timeseries = SnmpKit.SnmpLib.Dashboard.get_timeseries(:snmp_response_time)
# Get last 24 hours for specific device
device_data = SnmpKit.SnmpLib.Dashboard.get_timeseries(
  :snmp_response_time,
  24 * 60 * 60 * 1000,
  %{device: "192.168.1.1"}
)@spec record_metric(metric_name(), metric_value(), metric_tags()) :: :ok
Records a metric data point for monitoring and visualization.
Parameters
- metric_name: Unique identifier for the metric type
- value: Numeric value for the metric
- tags: Optional metadata for filtering and grouping
Examples
# Record response time metric
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_response_time, 125, %{
  device: "192.168.1.1",
  operation: "get",
  community: "public"
})
# Record error count
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_errors, 1, %{
  device: "192.168.1.1",
  error_type: "timeout"
})
# Record pool utilization
SnmpKit.SnmpLib.Dashboard.record_metric(:pool_utilization, 0.75, %{
  pool_name: "main_pool"
})@spec start_link(dashboard_opts()) :: {:ok, pid()} | {:error, any()}
Starts the dashboard server with monitoring and web interface.
Options
- port: Web dashboard port (default: 4000)
- update_interval: Metrics update frequency in milliseconds (default: 5000)
- retention_days: How long to keep historical data (default: 7)
- prometheus_enabled: Enable Prometheus metrics endpoint (default: false)
- grafana_integration: Enable Grafana dashboard integration (default: false)
Examples
# Start with defaults
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link()
# Start with custom configuration
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(
  port: 8080,
  prometheus_enabled: true,
  retention_days: 14
)