Alarmist.Ops (alarmist v0.2.2)
View SourceDerivative alarm generation operations
Summary
Functions
Replicate an alarm status
Set an alarm when the input has been set for a specified duration
Keep an alarm set for a guaranteed amount of time
Sets an alarm when the input alarm has been set and cleared too many times
Set an alarm when all of the input alarms are set
Set an alarm when the input alarm is cleared
Set an alarm when one or more input alarms get set
Functions
@spec copy(Alarmist.Engine.t(), [Alarmist.alarm_id()]) :: Alarmist.Engine.t()
Replicate an alarm status
This is useful for aliasing alarm names. For example, if one library sets and clears an alarm ID that's in its namespace, but another library wants to listen on changes to an alarm ID in its namespace, a copy rule can glue them together.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
OriginalAlarm
end
end
@spec debounce(Alarmist.Engine.t(), [Alarmist.alarm_id(), ...]) :: Alarmist.Engine.t()
Set an alarm when the input has been set for a specified duration
This rule removes transient alarms from triggering remediation unnecessarily. This is useful when remediation is expensive or service impacting and the input alarm is somewhat glitchy.
Alarmist already provides some debouncing since alarms that get set and cleared in one alarm processing pass are ignored already. This is unreliable, though, and a debounce rule establishes a duration.
An example of when debouncing is useful is to delay remediation of higher level alarms like being disconnected from a backend server. There are many reasons that a TCP connection could be interrupted and client code probably has some retry logic in it already to reestablish the connection. In this case, it might be good to delay switching to an offline mode for a little bit in the hopes that the problem will naturally go away.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
debounce(Alarm1, 1_000)
end
end
@spec hold(Alarmist.Engine.t(), [Alarmist.alarm_id(), ...]) :: Alarmist.Engine.t()
Keep an alarm set for a guaranteed amount of time
This sets an alarm for at least timeout
milliseconds after it is set. Each
time the alarm is set, the timer is restarted.
Hold is useful for types of remediation that are time based. I.e., handling an alarm means turning something off for a while since turning that feature back on when the alarm gets cleared would likely just result in the alarm being set again. Managing the timeout period via alarms rather than programmatically lets you manually clear the alarm if you'd like that feature enabled again immediately like if you're debugging.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
hold(Alarm1, 1_000)
end
end
@spec intensity(Alarmist.Engine.t(), [Alarmist.alarm_id(), ...]) :: Alarmist.Engine.t()
Sets an alarm when the input alarm has been set and cleared too many times
This type of rule catches flapping alarms where it's desirable to take some
kind of remediation when they trigger too many times in a row. Intensity is
measured as count
set/clears in duration
milliseconds. This is the same
as supervision restart intensity thresholds.
An example of an intensity-based alarm is to handle the case when multiple
network connections are available, but one that should be good is flakey.
This happens if a device has both a cellular and a WiFi connection. Normally
the WiFi connection is preferred, but if it keeps going up and down, it may
be desirable to raise an alarm. That alarm could disable WiFi for a while.
Combine this with hold/2
to manage the duration that WiFi is off.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
intensity(Alarm1, 3, 60_000)
end
end
@spec logical_and(Alarmist.Engine.t(), [Alarmist.alarm_id()]) :: Alarmist.Engine.t()
Set an alarm when all of the input alarms are set
This is useful when remediation is only useful when a lot of things go wrong. For example, if a device has more than one way of accomplishing a task, there could be a specific remediation when one way stops working. However, if every way is broken, the device could trigger a more significant remediation.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
Alarm1 and Alarm2
end
end
@spec logical_not(Alarmist.Engine.t(), [Alarmist.alarm_id()]) :: Alarmist.Engine.t()
Set an alarm when the input alarm is cleared
This is useful for "proof-of-life" alarms where the presence of an alarm is a good thing.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
not OriginalAlarm
end
end
@spec logical_or(Alarmist.Engine.t(), [Alarmist.alarm_id()]) :: Alarmist.Engine.t()
Set an alarm when one or more input alarms get set
This is useful for triggering a generic remediation. An example of this for setting an alarm that indicates that the device is "unhealthy" and needs to reboot. There are usually many disastrous alarms that when raised really have no great remediation other than reboot. This allows a handler to register for a combined alarm so that it's decoupled from the alarms that trigger it.
Example:
defmodule NewAlarm do
use Alarmist.Definition
defalarm do
Alarm1 or Alarm2
end
end