SnmpKit.SnmpLib.Dashboard (snmpkit v0.6.3)
Real-time monitoring dashboard and metrics aggregation for SNMP operations.
This module provides a comprehensive monitoring and visualization system for production SNMP deployments. Based on patterns proven in large-scale monitoring systems managing thousands of network devices.
Features
- Real-Time Metrics: Live updates of performance and health metrics
- Historical Analytics: Trend analysis and capacity planning data
- Alert Management: Configurable thresholds and notification routing
- Performance Insights: Detailed breakdown of operation performance
- Device Health: Per-device status monitoring and diagnostics
- Resource Utilization: Pool, memory, and system resource tracking
Metrics Categories
Performance Metrics
- Request/response times (min, max, average, percentiles)
- Throughput (operations per second)
- Error rates and failure classifications
- Connection pool utilization
Health Metrics
- Device availability and reachability
- Circuit breaker states
- Retry counts and backoff status
- Resource exhaustion indicators
System Metrics
- Memory usage and garbage collection
- Process counts and supervision tree health
- Network socket utilization
- Queue depths and processing delays
Dashboard Views
Overview Dashboard
Global health and performance summary with key indicators.
Device Dashboard
Per-device detailed metrics and troubleshooting information.
Pool Dashboard
Connection pool health, utilization, and performance metrics.
Alerts Dashboard
Active alerts, acknowledgments, and escalation status.
Usage Patterns
# Start the dashboard server
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(port: 4000)
# Record custom metrics
SnmpKit.SnmpLib.Dashboard.record_metric(:custom_operation, %{
duration: 150,
device: "192.168.1.1",
status: :success
})
# Create custom alert
SnmpKit.SnmpLib.Dashboard.create_alert(:high_error_rate, %{
device: "192.168.1.100",
error_rate: 0.15,
threshold: 0.10
})
# Export metrics for external systems
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()
Integration with External Systems
- Prometheus: Native metrics export in Prometheus format
- Grafana: Pre-built dashboards and alerting rules
- PagerDuty: Alert escalation and incident management
- Slack/Teams: Notification integration for team alerting
Summary
Functions
Acknowledges an alert to stop notifications.
Returns a specification to start this module under a supervisor.
Creates an alert for monitoring and notification systems.
Exports metrics in Prometheus format for external monitoring.
Gets all active alerts with optional filtering.
Gets detailed metrics for a specific device.
Gets current performance metrics summary.
Gets historical time series data for a metric.
Records a metric data point for monitoring and visualization.
Starts the dashboard server with monitoring and web interface.
Types
@type alert_level() :: :info | :warning | :critical
@type dashboard_opts() :: [ port: pos_integer(), update_interval: pos_integer(), retention_days: pos_integer(), prometheus_enabled: boolean(), grafana_integration: boolean() ]
@type metric_name() :: atom()
@type metric_tags() :: map()
@type metric_value() :: number()
Functions
Acknowledges an alert to stop notifications.
Examples
:ok = SnmpKit.SnmpLib.Dashboard.acknowledge_alert(:device_unreachable, "192.168.1.1")
Returns a specification to start this module under a supervisor.
See Supervisor
.
@spec create_alert(atom(), alert_level(), map()) :: :ok
Creates an alert for monitoring and notification systems.
Parameters
alert_name
: Unique identifier for the alert typelevel
: Alert severity level (:info, :warning, :critical)details
: Alert metadata and context information
Examples
# Create device unreachable alert
SnmpKit.SnmpLib.Dashboard.create_alert(:device_unreachable, :critical, %{
device: "192.168.1.1",
last_seen: DateTime.utc_now(),
consecutive_failures: 5
})
# Create performance degradation warning
SnmpKit.SnmpLib.Dashboard.create_alert(:slow_response, :warning, %{
device: "192.168.1.1",
avg_response_time: 5000,
threshold: 2000
})
@spec export_prometheus() :: binary()
Exports metrics in Prometheus format for external monitoring.
Examples
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()
File.write!("/tmp/snmp_metrics.prom", prometheus_data)
Gets all active alerts with optional filtering.
Examples
all_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts()
critical_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts(level: :critical)
Gets detailed metrics for a specific device.
Examples
device_metrics = SnmpKit.SnmpLib.Dashboard.get_device_metrics("192.168.1.1")
IO.inspect device_metrics.response_times
@spec get_metrics_summary() :: map()
Gets current performance metrics summary.
Returns
A map containing aggregated metrics:
total_operations
: Total SNMP operations performedsuccess_rate
: Percentage of successful operationsavg_response_time
: Average response time in millisecondsactive_devices
: Number of devices being monitoredpool_utilization
: Connection pool usage percentageerror_rates
: Breakdown of error types and frequencies
Examples
metrics = SnmpKit.SnmpLib.Dashboard.get_metrics_summary()
IO.puts "Success rate: " <> Float.to_string(metrics.success_rate * 100) <> "%"
@spec get_timeseries(metric_name(), pos_integer(), map()) :: [map()]
Gets historical time series data for a metric.
Parameters
metric_name
: Name of the metric to retrieveduration
: Time window in milliseconds (default: 1 hour)tags
: Optional tag filters
Examples
# Get last hour of response times
timeseries = SnmpKit.SnmpLib.Dashboard.get_timeseries(:snmp_response_time)
# Get last 24 hours for specific device
device_data = SnmpKit.SnmpLib.Dashboard.get_timeseries(
:snmp_response_time,
24 * 60 * 60 * 1000,
%{device: "192.168.1.1"}
)
@spec record_metric(metric_name(), metric_value(), metric_tags()) :: :ok
Records a metric data point for monitoring and visualization.
Parameters
metric_name
: Unique identifier for the metric typevalue
: Numeric value for the metrictags
: Optional metadata for filtering and grouping
Examples
# Record response time metric
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_response_time, 125, %{
device: "192.168.1.1",
operation: "get",
community: "public"
})
# Record error count
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_errors, 1, %{
device: "192.168.1.1",
error_type: "timeout"
})
# Record pool utilization
SnmpKit.SnmpLib.Dashboard.record_metric(:pool_utilization, 0.75, %{
pool_name: "main_pool"
})
@spec start_link(dashboard_opts()) :: {:ok, pid()} | {:error, any()}
Starts the dashboard server with monitoring and web interface.
Options
port
: Web dashboard port (default: 4000)update_interval
: Metrics update frequency in milliseconds (default: 5000)retention_days
: How long to keep historical data (default: 7)prometheus_enabled
: Enable Prometheus metrics endpoint (default: false)grafana_integration
: Enable Grafana dashboard integration (default: false)
Examples
# Start with defaults
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link()
# Start with custom configuration
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(
port: 8080,
prometheus_enabled: true,
retention_days: 14
)