SnmpKit.SnmpLib.Monitor (snmpkit v1.2.0)
Performance monitoring and metrics collection for SNMP operations.
This module provides comprehensive monitoring capabilities for SNMP applications, including real-time metrics, performance analytics, and health monitoring. Based on monitoring patterns proven in large-scale network management systems.
Features
- Real-time Metrics: Live performance data collection and analysis
- Historical Analytics: Trend analysis and capacity planning data
- Health Monitoring: Automatic detection of performance degradation
- Alerting: Configurable thresholds and notification system
- Device Profiling: Per-device performance characteristics
- Operation Tracking: Detailed metrics for all SNMP operation types
Metric Categories
Operation Metrics
- Request/response times
- Success/failure rates
- Throughput measurements
- Error classifications
Device Metrics
- Per-device response characteristics
- Availability percentages
- Performance trends
- Health scores
System Metrics
- Connection pool utilization
- Memory usage patterns
- Resource consumption
- Concurrent operation counts
Usage Examples
# Start monitoring system
{:ok, _pid} = SnmpKit.SnmpLib.Monitor.start_link()
# Record SNMP operation
SnmpKit.SnmpLib.Monitor.record_operation(
  device: "192.168.1.1",
  operation: :get,
  duration: 245,
  result: :success
)
# Get real-time stats
stats = SnmpKit.SnmpLib.Monitor.get_stats("192.168.1.1")
IO.puts("Average response time: " <> to_string(stats.avg_response_time) <> "ms")
# Set up alerting
SnmpKit.SnmpLib.Monitor.set_alert_threshold("192.168.1.1", :response_time, 5000)Summary
Functions
Returns a specification to start this module under a supervisor.
Exports monitoring data for external analysis.
Gets currently active alerts.
Gets comprehensive statistics for a specific device.
Gets performance metrics for a specific operation type.
Gets system-wide statistics and performance metrics.
Forces a health check of all monitored devices.
Records an SNMP operation for monitoring and analysis.
Removes an alert threshold.
Sets an alert threshold for automated monitoring.
Starts the monitoring system.
Types
@type alert_threshold() :: %{ device_id: device_id(), metric: metric_type(), threshold: number(), condition: :above | :below, duration: pos_integer(), callback: function() | nil }
@type device_id() :: binary()
@type device_stats() :: %{ device_id: device_id(), total_operations: non_neg_integer(), successful_operations: non_neg_integer(), failed_operations: non_neg_integer(), avg_response_time: float(), p95_response_time: float(), p99_response_time: float(), error_rate: float(), availability: float(), health_score: float(), last_seen: integer(), trend: :improving | :stable | :degrading }
@type metric_type() :: :response_time | :error_rate | :throughput | :availability
      @type operation_metric() :: %{ device: device_id(), operation: operation_type(), timestamp: integer(), duration: non_neg_integer(), result: operation_result(), error_type: atom() | nil, bytes_sent: non_neg_integer() | nil, bytes_received: non_neg_integer() | nil }
@type operation_result() :: :success | :error | :timeout | :partial
      @type operation_type() :: :get | :get_next | :get_bulk | :set | :walk
      @type system_stats() :: %{ total_devices: non_neg_integer(), active_devices: non_neg_integer(), total_operations: non_neg_integer(), operations_per_second: float(), average_response_time: float(), global_error_rate: float(), memory_usage: non_neg_integer(), uptime: non_neg_integer() }
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.
Exports monitoring data for external analysis.
Parameters
- format: Export format (- :json,- :csv,- :prometheus)
- timeframe: Time range for export
JSON Export
JSON export uses Elixir's built-in JSON module (requires Elixir 1.18+).
Examples
data = SnmpKit.SnmpLib.Monitor.export_data(:json, :last_hour)
case data do
  "JSON export unavailable" <> _ -> IO.puts("JSON not available")
  json -> File.write!("snmp_metrics.json", json)
end@spec get_active_alerts() :: [map()]
Gets currently active alerts.
Examples
alerts = SnmpKit.SnmpLib.Monitor.get_active_alerts()
Enum.each(alerts, fn alert ->
  IO.puts("Alert: " <> alert.device_id <> " " <> to_string(alert.metric) <> " " <> to_string(alert.current_value))
end)@spec get_device_stats(device_id(), atom()) :: device_stats() | {:error, :not_found}
Gets comprehensive statistics for a specific device.
Parameters
- device_id: Device identifier
- timeframe: Optional timeframe (:last_hour, :last_day, :all_time)
Returns
Device statistics map or {:error, :not_found} if device has no recorded operations.
Examples
# Get current device stats
stats = SnmpKit.SnmpLib.Monitor.get_device_stats("192.168.1.1")
IO.puts("Error rate: " <> to_string(stats.error_rate) <> "%")
# Get stats for specific timeframe
stats = SnmpKit.SnmpLib.Monitor.get_device_stats("192.168.1.1", :last_hour)@spec get_operation_metrics(operation_type(), atom()) :: map()
Gets performance metrics for a specific operation type.
Parameters
- operation: SNMP operation type
- timeframe: Optional timeframe for analysis
Examples
metrics = SnmpKit.SnmpLib.Monitor.get_operation_metrics(:get_bulk)
IO.puts("Average GETBULK time: " <> to_string(metrics.avg_duration) <> "ms")@spec get_system_stats() :: system_stats()
Gets system-wide statistics and performance metrics.
Returns
Comprehensive system statistics including global performance metrics, device counts, and resource utilization.
Examples
stats = SnmpKit.SnmpLib.Monitor.get_system_stats()
IO.puts("Total devices monitored: " <> to_string(stats.total_devices))
IO.puts("Operations per second: " <> to_string(stats.operations_per_second))@spec health_check() :: :ok
      Forces a health check of all monitored devices.
Useful for immediate assessment of system health.
Examples
:ok = SnmpKit.SnmpLib.Monitor.health_check()@spec record_operation(map()) :: :ok
Records an SNMP operation for monitoring and analysis.
This is the primary interface for feeding operation data into the monitoring system. Should be called after every SNMP operation for comprehensive monitoring.
Parameters
- metric: Operation metric map with required fields
Required Fields
- device: Target device identifier
- operation: Type of SNMP operation
- duration: Operation duration in milliseconds
- result: Operation result status
Optional Fields
- error_type: Specific error classification (if result is :error)
- bytes_sent: Number of bytes sent
- bytes_received: Number of bytes received
- timestamp: Override timestamp (defaults to current time)
Examples
# Basic operation recording
SnmpKit.SnmpLib.Monitor.record_operation(%{
  device: "192.168.1.1",
  operation: :get,
  duration: 245,
  result: :success
})
# Detailed operation recording
SnmpKit.SnmpLib.Monitor.record_operation(%{
  device: "192.168.1.1",
  operation: :get_bulk,
  duration: 1250,
  result: :error,
  error_type: :timeout,
  bytes_sent: 64,
  bytes_received: 0
})@spec remove_alert_threshold(device_id(), metric_type()) :: :ok
Removes an alert threshold.
Examples
:ok = SnmpKit.SnmpLib.Monitor.remove_alert_threshold("192.168.1.1", :response_time)@spec set_alert_threshold(device_id(), metric_type(), number(), keyword()) :: :ok
Sets an alert threshold for automated monitoring.
Alerts fire when the specified metric exceeds the threshold for the given duration.
Parameters
- device_id: Device to monitor (use ":global" for system-wide alerts)
- metric: Metric type to monitor
- threshold: Threshold value
- opts: Alert configuration options
Options
- condition:- :aboveor- :below(default:- :above)
- duration: How long threshold must be exceeded (default: 60000ms)
- callback: Function to call when alert fires
Examples
# Alert on high response times
SnmpKit.SnmpLib.Monitor.set_alert_threshold("192.168.1.1", :response_time, 5000)
# Alert on low availability with custom callback
SnmpKit.SnmpLib.Monitor.set_alert_threshold("core-router", :availability, 95.0,
  condition: :below,
  duration: 300_000,
  callback: &MyApp.Alerts.device_down/1
)Starts the monitoring system.
Options
- retention_period: How long to keep historical data (default: 1 hour)
- bucket_size: Time bucket size for aggregation (default: 1 minute)
- cleanup_interval: How often to clean old data (default: 5 minutes)
- health_check_interval: How often to check device health (default: 1 minute)
Examples
{:ok, pid} = SnmpKit.SnmpLib.Monitor.start_link()
{:ok, pid} = SnmpKit.SnmpLib.Monitor.start_link(
  retention_period: 7200_000,  # 2 hours
  bucket_size: 30_000          # 30 second buckets
)