SnmpKit.SnmpLib.Monitor (snmpkit v0.6.4)
Performance monitoring and metrics collection for SNMP operations.
This module provides comprehensive monitoring capabilities for SNMP applications, including real-time metrics, performance analytics, and health monitoring. Based on monitoring patterns proven in large-scale network management systems.
Features
- Real-time Metrics: Live performance data collection and analysis
- Historical Analytics: Trend analysis and capacity planning data
- Health Monitoring: Automatic detection of performance degradation
- Alerting: Configurable thresholds and notification system
- Device Profiling: Per-device performance characteristics
- Operation Tracking: Detailed metrics for all SNMP operation types
Metric Categories
Operation Metrics
- Request/response times
- Success/failure rates
- Throughput measurements
- Error classifications
Device Metrics
- Per-device response characteristics
- Availability percentages
- Performance trends
- Health scores
System Metrics
- Connection pool utilization
- Memory usage patterns
- Resource consumption
- Concurrent operation counts
Usage Examples
# Start monitoring system
{:ok, _pid} = SnmpKit.SnmpLib.Monitor.start_link()
# Record SNMP operation
SnmpKit.SnmpLib.Monitor.record_operation(
device: "192.168.1.1",
operation: :get,
duration: 245,
result: :success
)
# Get real-time stats
stats = SnmpKit.SnmpLib.Monitor.get_stats("192.168.1.1")
IO.puts("Average response time: " <> to_string(stats.avg_response_time) <> "ms")
# Set up alerting
SnmpKit.SnmpLib.Monitor.set_alert_threshold("192.168.1.1", :response_time, 5000)
Summary
Functions
Returns a specification to start this module under a supervisor.
Exports monitoring data for external analysis.
Gets currently active alerts.
Gets comprehensive statistics for a specific device.
Gets performance metrics for a specific operation type.
Gets system-wide statistics and performance metrics.
Forces a health check of all monitored devices.
Records an SNMP operation for monitoring and analysis.
Removes an alert threshold.
Sets an alert threshold for automated monitoring.
Starts the monitoring system.
Types
@type alert_threshold() :: %{ device_id: device_id(), metric: metric_type(), threshold: number(), condition: :above | :below, duration: pos_integer(), callback: function() | nil }
@type device_id() :: binary()
@type device_stats() :: %{ device_id: device_id(), total_operations: non_neg_integer(), successful_operations: non_neg_integer(), failed_operations: non_neg_integer(), avg_response_time: float(), p95_response_time: float(), p99_response_time: float(), error_rate: float(), availability: float(), health_score: float(), last_seen: integer(), trend: :improving | :stable | :degrading }
@type metric_type() :: :response_time | :error_rate | :throughput | :availability
@type operation_metric() :: %{ device: device_id(), operation: operation_type(), timestamp: integer(), duration: non_neg_integer(), result: operation_result(), error_type: atom() | nil, bytes_sent: non_neg_integer() | nil, bytes_received: non_neg_integer() | nil }
@type operation_result() :: :success | :error | :timeout | :partial
@type operation_type() :: :get | :get_next | :get_bulk | :set | :walk
@type system_stats() :: %{ total_devices: non_neg_integer(), active_devices: non_neg_integer(), total_operations: non_neg_integer(), operations_per_second: float(), average_response_time: float(), global_error_rate: float(), memory_usage: non_neg_integer(), uptime: non_neg_integer() }
Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Exports monitoring data for external analysis.
Parameters
format
: Export format (:json
,:csv
,:prometheus
)timeframe
: Time range for export
JSON Export
JSON export uses Elixir's built-in JSON module (requires Elixir 1.18+).
Examples
data = SnmpKit.SnmpLib.Monitor.export_data(:json, :last_hour)
case data do
"JSON export unavailable" <> _ -> IO.puts("JSON not available")
json -> File.write!("snmp_metrics.json", json)
end
@spec get_active_alerts() :: [map()]
Gets currently active alerts.
Examples
alerts = SnmpKit.SnmpLib.Monitor.get_active_alerts()
Enum.each(alerts, fn alert ->
IO.puts("Alert: " <> alert.device_id <> " " <> to_string(alert.metric) <> " " <> to_string(alert.current_value))
end)
@spec get_device_stats(device_id(), atom()) :: device_stats() | {:error, :not_found}
Gets comprehensive statistics for a specific device.
Parameters
device_id
: Device identifiertimeframe
: Optional timeframe (:last_hour, :last_day, :all_time)
Returns
Device statistics map or {:error, :not_found}
if device has no recorded operations.
Examples
# Get current device stats
stats = SnmpKit.SnmpLib.Monitor.get_device_stats("192.168.1.1")
IO.puts("Error rate: " <> to_string(stats.error_rate) <> "%")
# Get stats for specific timeframe
stats = SnmpKit.SnmpLib.Monitor.get_device_stats("192.168.1.1", :last_hour)
@spec get_operation_metrics(operation_type(), atom()) :: map()
Gets performance metrics for a specific operation type.
Parameters
operation
: SNMP operation typetimeframe
: Optional timeframe for analysis
Examples
metrics = SnmpKit.SnmpLib.Monitor.get_operation_metrics(:get_bulk)
IO.puts("Average GETBULK time: " <> to_string(metrics.avg_duration) <> "ms")
@spec get_system_stats() :: system_stats()
Gets system-wide statistics and performance metrics.
Returns
Comprehensive system statistics including global performance metrics, device counts, and resource utilization.
Examples
stats = SnmpKit.SnmpLib.Monitor.get_system_stats()
IO.puts("Total devices monitored: " <> to_string(stats.total_devices))
IO.puts("Operations per second: " <> to_string(stats.operations_per_second))
@spec health_check() :: :ok
Forces a health check of all monitored devices.
Useful for immediate assessment of system health.
Examples
:ok = SnmpKit.SnmpLib.Monitor.health_check()
@spec record_operation(map()) :: :ok
Records an SNMP operation for monitoring and analysis.
This is the primary interface for feeding operation data into the monitoring system. Should be called after every SNMP operation for comprehensive monitoring.
Parameters
metric
: Operation metric map with required fields
Required Fields
device
: Target device identifieroperation
: Type of SNMP operationduration
: Operation duration in millisecondsresult
: Operation result status
Optional Fields
error_type
: Specific error classification (if result is :error)bytes_sent
: Number of bytes sentbytes_received
: Number of bytes receivedtimestamp
: Override timestamp (defaults to current time)
Examples
# Basic operation recording
SnmpKit.SnmpLib.Monitor.record_operation(%{
device: "192.168.1.1",
operation: :get,
duration: 245,
result: :success
})
# Detailed operation recording
SnmpKit.SnmpLib.Monitor.record_operation(%{
device: "192.168.1.1",
operation: :get_bulk,
duration: 1250,
result: :error,
error_type: :timeout,
bytes_sent: 64,
bytes_received: 0
})
@spec remove_alert_threshold(device_id(), metric_type()) :: :ok
Removes an alert threshold.
Examples
:ok = SnmpKit.SnmpLib.Monitor.remove_alert_threshold("192.168.1.1", :response_time)
@spec set_alert_threshold(device_id(), metric_type(), number(), keyword()) :: :ok
Sets an alert threshold for automated monitoring.
Alerts fire when the specified metric exceeds the threshold for the given duration.
Parameters
device_id
: Device to monitor (use ":global" for system-wide alerts)metric
: Metric type to monitorthreshold
: Threshold valueopts
: Alert configuration options
Options
condition
::above
or:below
(default::above
)duration
: How long threshold must be exceeded (default: 60000ms)callback
: Function to call when alert fires
Examples
# Alert on high response times
SnmpKit.SnmpLib.Monitor.set_alert_threshold("192.168.1.1", :response_time, 5000)
# Alert on low availability with custom callback
SnmpKit.SnmpLib.Monitor.set_alert_threshold("core-router", :availability, 95.0,
condition: :below,
duration: 300_000,
callback: &MyApp.Alerts.device_down/1
)
Starts the monitoring system.
Options
retention_period
: How long to keep historical data (default: 1 hour)bucket_size
: Time bucket size for aggregation (default: 1 minute)cleanup_interval
: How often to clean old data (default: 5 minutes)health_check_interval
: How often to check device health (default: 1 minute)
Examples
{:ok, pid} = SnmpKit.SnmpLib.Monitor.start_link()
{:ok, pid} = SnmpKit.SnmpLib.Monitor.start_link(
retention_period: 7200_000, # 2 hours
bucket_size: 30_000 # 30 second buckets
)