SnmpKit.SnmpLib.Dashboard (snmpkit v0.6.3)

Real-time monitoring dashboard and metrics aggregation for SNMP operations.

This module provides a comprehensive monitoring and visualization system for production SNMP deployments. Based on patterns proven in large-scale monitoring systems managing thousands of network devices.

Features

Real-Time Metrics: Live updates of performance and health metrics
Historical Analytics: Trend analysis and capacity planning data
Alert Management: Configurable thresholds and notification routing
Performance Insights: Detailed breakdown of operation performance
Device Health: Per-device status monitoring and diagnostics
Resource Utilization: Pool, memory, and system resource tracking

Metrics Categories

Performance Metrics

Request/response times (min, max, average, percentiles)
Throughput (operations per second)
Error rates and failure classifications
Connection pool utilization

Health Metrics

Device availability and reachability
Circuit breaker states
Retry counts and backoff status
Resource exhaustion indicators

System Metrics

Memory usage and garbage collection
Process counts and supervision tree health
Network socket utilization
Queue depths and processing delays

Dashboard Views

Overview Dashboard

Global health and performance summary with key indicators.

Device Dashboard

Per-device detailed metrics and troubleshooting information.

Pool Dashboard

Connection pool health, utilization, and performance metrics.

Alerts Dashboard

Active alerts, acknowledgments, and escalation status.

Usage Patterns

# Start the dashboard server
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(port: 4000)

# Record custom metrics
SnmpKit.SnmpLib.Dashboard.record_metric(:custom_operation, %{
  duration: 150,
  device: "192.168.1.1",
  status: :success
})

# Create custom alert
SnmpKit.SnmpLib.Dashboard.create_alert(:high_error_rate, %{
  device: "192.168.1.100",
  error_rate: 0.15,
  threshold: 0.10
})

# Export metrics for external systems
prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()

Integration with External Systems

Prometheus: Native metrics export in Prometheus format
Grafana: Pre-built dashboards and alerting rules
PagerDuty: Alert escalation and incident management
Slack/Teams: Notification integration for team alerting

Summary

Types

alert_level()

dashboard_opts()

metric_name()

metric_tags()

metric_value()

Functions

acknowledge_alert(alert_name, identifier)

Acknowledges an alert to stop notifications.

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

create_alert(alert_name, level, details \\ %{})

Creates an alert for monitoring and notification systems.

export_prometheus()

Exports metrics in Prometheus format for external monitoring.

get_active_alerts(filters \\ [])

Gets all active alerts with optional filtering.

get_device_metrics(device_id)

Gets detailed metrics for a specific device.

get_metrics_summary()

Gets current performance metrics summary.

get_timeseries(metric_name, duration \\ 3_600_000, tags \\ %{})

Gets historical time series data for a metric.

record_metric(metric_name, value, tags \\ %{})

Records a metric data point for monitoring and visualization.

start_link(opts \\ [])

Starts the dashboard server with monitoring and web interface.

Types

alert_level()

@type alert_level() :: :info | :warning | :critical

dashboard_opts()

@type dashboard_opts() :: [
  port: pos_integer(),
  update_interval: pos_integer(),
  retention_days: pos_integer(),
  prometheus_enabled: boolean(),
  grafana_integration: boolean()
]

metric_name()

@type metric_name() :: atom()

metric_tags()

@type metric_tags() :: map()

metric_value()

@type metric_value() :: number()

Functions

acknowledge_alert(alert_name, identifier)

@spec acknowledge_alert(atom(), any()) :: :ok

Acknowledges an alert to stop notifications.

Examples

:ok = SnmpKit.SnmpLib.Dashboard.acknowledge_alert(:device_unreachable, "192.168.1.1")

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

create_alert(alert_name, level, details \\ %{})

@spec create_alert(atom(), alert_level(), map()) :: :ok

Creates an alert for monitoring and notification systems.

Parameters

alert_name: Unique identifier for the alert type
level: Alert severity level (:info, :warning, :critical)
details: Alert metadata and context information

Examples

# Create device unreachable alert
SnmpKit.SnmpLib.Dashboard.create_alert(:device_unreachable, :critical, %{
  device: "192.168.1.1",
  last_seen: DateTime.utc_now(),
  consecutive_failures: 5
})

# Create performance degradation warning
SnmpKit.SnmpLib.Dashboard.create_alert(:slow_response, :warning, %{
  device: "192.168.1.1",
  avg_response_time: 5000,
  threshold: 2000
})

export_prometheus()

@spec export_prometheus() :: binary()

Exports metrics in Prometheus format for external monitoring.

Examples

prometheus_data = SnmpKit.SnmpLib.Dashboard.export_prometheus()
File.write!("/tmp/snmp_metrics.prom", prometheus_data)

get_active_alerts(filters \\ [])

@spec get_active_alerts(keyword()) :: [map()]

Gets all active alerts with optional filtering.

Examples

all_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts()
critical_alerts = SnmpKit.SnmpLib.Dashboard.get_active_alerts(level: :critical)

get_device_metrics(device_id)

@spec get_device_metrics(binary()) :: map()

Gets detailed metrics for a specific device.

Examples

device_metrics = SnmpKit.SnmpLib.Dashboard.get_device_metrics("192.168.1.1")
IO.inspect device_metrics.response_times

get_metrics_summary()

@spec get_metrics_summary() :: map()

Gets current performance metrics summary.

Returns

A map containing aggregated metrics:

total_operations: Total SNMP operations performed
success_rate: Percentage of successful operations
avg_response_time: Average response time in milliseconds
active_devices: Number of devices being monitored
pool_utilization: Connection pool usage percentage
error_rates: Breakdown of error types and frequencies

Examples

metrics = SnmpKit.SnmpLib.Dashboard.get_metrics_summary()
IO.puts "Success rate: " <> Float.to_string(metrics.success_rate * 100) <> "%"

get_timeseries(metric_name, duration \\ 3_600_000, tags \\ %{})

@spec get_timeseries(metric_name(), pos_integer(), map()) :: [map()]

Gets historical time series data for a metric.

Parameters

metric_name: Name of the metric to retrieve
duration: Time window in milliseconds (default: 1 hour)
tags: Optional tag filters

Examples

# Get last hour of response times
timeseries = SnmpKit.SnmpLib.Dashboard.get_timeseries(:snmp_response_time)

# Get last 24 hours for specific device
device_data = SnmpKit.SnmpLib.Dashboard.get_timeseries(
  :snmp_response_time,
  24 * 60 * 60 * 1000,
  %{device: "192.168.1.1"}
)

record_metric(metric_name, value, tags \\ %{})

@spec record_metric(metric_name(), metric_value(), metric_tags()) :: :ok

Records a metric data point for monitoring and visualization.

Parameters

metric_name: Unique identifier for the metric type
value: Numeric value for the metric
tags: Optional metadata for filtering and grouping

Examples

# Record response time metric
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_response_time, 125, %{
  device: "192.168.1.1",
  operation: "get",
  community: "public"
})

# Record error count
SnmpKit.SnmpLib.Dashboard.record_metric(:snmp_errors, 1, %{
  device: "192.168.1.1",
  error_type: "timeout"
})

# Record pool utilization
SnmpKit.SnmpLib.Dashboard.record_metric(:pool_utilization, 0.75, %{
  pool_name: "main_pool"
})

start_link(opts \\ [])

@spec start_link(dashboard_opts()) :: {:ok, pid()} | {:error, any()}

Starts the dashboard server with monitoring and web interface.

Options

port: Web dashboard port (default: 4000)
update_interval: Metrics update frequency in milliseconds (default: 5000)
retention_days: How long to keep historical data (default: 7)
prometheus_enabled: Enable Prometheus metrics endpoint (default: false)
grafana_integration: Enable Grafana dashboard integration (default: false)

Examples

# Start with defaults
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link()

# Start with custom configuration
{:ok, _pid} = SnmpKit.SnmpLib.Dashboard.start_link(
  port: 8080,
  prometheus_enabled: true,
  retention_days: 14
)