Alarmist (alarmist v0.4.0)
View SourceAlarm handler and more
Alarmist provides an :alarm_handler implementation that allows you to check
what alarms are currently active and subscribe to alarm status changes.
It also provides a DSL for defining alarms based on other alarms. See
Alarmist.Alarm.
Summary
Types
Alarm information
Alarm description
Alarm identifier
Patterns for alarm subscriptions
Alarm state
Alarm type
Remedy callback with or without options
Callback function for fixing alarms
Options for running the remedy callback
Functions
Add a managed alarm
Add a callback to fix an Alarm ID
Get the current state of an alarm
Extract the alarm type from an alarm ID
Clear knowledge of an alarm's level
Return a list of all active alarm IDs
Return a list of all active alarms
Print alarm status in a nice table
Return all managed alarm IDs
Remove a managed alarm
Remove a remedy callback
Set or change the alarm level for an alarm
Subscribe to alarm status events
Subscribe to alarm status events for all alarms
Unsubscribe the current process from the specified alarm :set and :clear events
Unsubscribe from alarm status events for all alarms
Types
@type alarm() :: {alarm_id(), alarm_description()}
Alarm information
Calls to :alarm_handler.set_alarm/1 pass an alarm identifier and
description as a 2-tuple. Alarmist stores the description of the most recent
call.
:alarm_handler.set_alarm/1 doesn't enforce the use of 2-tuples. Alarmist
normalizes non-2-tuple alarms so that they have empty descriptions.
@type alarm_description() :: any()
Alarm description
This is optional supplemental information about the alarm. It could contain more information about why it was set. Don't use it to differentiate between alarms. Use the alarm ID for that.
@type alarm_id() :: alarm_type() | {alarm_type(), any()} | {alarm_type(), any(), any()} | {alarm_type(), any(), any(), any()}
Alarm identifier
Alarm identifiers are the unique identifiers of each alarm that can be set or cleared.
While SASL alarm identifiers can be anything, Alarmist supplies conventions so that it can interpret them. This typespec follows those conventions, but you may come across codes that doesn't. Those cases may be ignored or misinterpreted.
@type alarm_pattern() :: alarm_type() | :_ | {alarm_type() | :_, any() | :_} | {alarm_type() | :_, any() | :_, any() | :_}
Patterns for alarm subscriptions
Patterns can be exact matches or use :_ to match any value in a position.
@type alarm_state() :: :set | :clear | :unknown
Alarm state
Alarms are in the :set state after a call to :alarm_handler.set_alarm/1
and in the :clear state after a call to :alarm_handler.clear_alarm/1.
Redundant calls to :alarm_handler.set_alarm/1 update the alarm description
and redundant calls to :alarm_handler.clear_alarm/1 are ignored.
The :unknown state is used for alarms that are unknown to Alarmist. These
alarms may have typos in the names or they simply may not have been set
or cleared yet.
@type alarm_type() :: atom()
Alarm type
Alarm types are atoms and for Alarmist-managed alarms, they are module names.
@type info_options() :: [ level: Logger.level(), sort: :level | :alarm_id | :duration, ansi_enabled?: boolean() ]
See Alarmist.info/1
@type remedy() :: remedy_fn() | {remedy_fn(), remedy_options()}
Remedy callback with or without options
Callback function for fixing alarms
This may be an MFA or function reference that takes zero or one arguments. If it takes one argument, the alarm ID is passed.
Options for running the remedy callback
:retry_timeout— time to wait for the alarm to be cleared before calling the callback again (default::infinity):callback_timeout— time to wait for the callback to run (default: 60 seconds)
@opaque rule()
Functions
@spec add_managed_alarm(alarm_id()) :: :ok
Add a managed alarm
After this call, Alarmist will watch for alarms to be set based on the
supplied module and set or clear the specified alarm ID. The module must
use Alarmist.Alarm.
Calling this function a multiple times with the same alarm results in the previous alarm being replaced. Alarm subscribers won't receive redundant events if the rules are the same.
@spec add_remedy(alarm_id(), remedy_fn(), remedy_options()) :: :ok | {:error, atom()}
Add a callback to fix an Alarm ID
This is a simple way of adding a callback function to deal with an alarm
being set. Conceptually it is similar to starting a GenServer, calling
subscribe/1, and running the callback on alarm set messages. It provides a
number of conveniences:
- Supervision is handled for you. If the callback crashes, you'll get a message in the log, but it won't prevent future attempts
- Handles fast toggling of alarm states to prevent the callback runs from queuing or running concurrently
- Can repeatedly call the callback after a retry delay for alarms that aren't clearing
- Times out hung callbacks to allow for future invocations without violating the guarantee that only one callback is run for an alarm ID at any one time.
Only one remedy callback can be registered per alarm ID. If you are running
the remedy on a managed alarm, see Alarmist.Alarm for specifying it there
and the remedy callback will be automatically added when the managed alarm
is.
Options:
:retry_timeout— time to wait for the alarm to be cleared before calling the callback again (default::infinity):callback_timeout— time to wait for the callback to run (default: 60 seconds)
Since there can only be one remedy per Alarm ID, subsequent calls replace. If
an alarm is already set, the new callback will be called the next time. This
means that crash/restarts of the process that adds the remedy does not cause
the callback to be invoked twice. In fact, if the callback and options are
the same, it will look like a no-op. If you don't want this behavior, call
remove_remedy/1 and then add_remedy/3 to force new calls to be made.
@spec alarm_state(alarm_id()) :: alarm_state()
Get the current state of an alarm
Alarms get known by Alarmist when they're first set or cleared.
@spec alarm_type(alarm_id()) :: alarm_type()
Extract the alarm type from an alarm ID
Examples:
iex> Alarmist.alarm_type(MyAlarm)
MyAlarm
iex> Alarmist.alarm_type({NetworkBroken, "eth0"})
NetworkBroken
@spec clear_alarm_level(alarm_id()) :: :ok
Clear knowledge of an alarm's level
If the alarm gets reported after this call, it will be assigned the default
alarm level, :warning.
@spec get_alarm_ids([{:level, Logger.level()}]) :: [alarm_id()]
Return a list of all active alarm IDs
Options:
:level- filter alarms by severity. Defaults to:info.
@spec get_alarms([{:level, Logger.level()}]) :: [alarm()]
Return a list of all active alarms
This returns {id, description} tuples. Note that Alarmist normalizes
alarms that were not set as 2-tuples so this may not match calls to
:alarm_handler.set_alarm/1.
Options:
:level- filter alarms by severity. Defaults to:info.
@spec info(info_options()) :: :ok
Print alarm status in a nice table
Options:
:ansi_enabled?- override the default ANSI setting. Defaults totrue.:level- filter alarms by severity. Defaults to:info.:show_cleared?- show cleared alarms. Defaults tofalse.
Return all managed alarm IDs
@spec remove_managed_alarm(alarm_id()) :: :ok
Remove a managed alarm
@spec remove_remedy(alarm_id()) :: :ok | {:error, :not_found}
Remove a remedy callback
If the callback is currently running, Alarmist brutally kills its worker process.
There's generally no need to remove a remedy callback that's automatically added as part of a managed alarm. Removing the managed alarm removes its remedy.
@spec set_alarm_level(alarm_id(), Logger.level()) :: :ok
Set or change the alarm level for an alarm
The alarm can be either for a managed or unmanaged alarm. Once set, that alarm will be reported with the specified level.
While this can be used with managed alarms, you should normally pass the
desired level as an option to use Alarmist.Alarm so that it's handled for
you.
It's also possible to set levels for unmanaged alarms in the application configuration:
config :alarmist, alarm_levels: %{MyUnmanagedAlarm => :critical}NOTE: Changing the alarm level does not change the status of existing alarms since there's no mechanism to go back in time and change reports. Future events will be reported with the new level.
@spec subscribe(alarm_pattern()) :: :ok
Subscribe to alarm status events
Events will be delivered to the calling process as:
%Alarmist.Event{
id: TheAlarmId,
state: :set,
description: nil,
level: :warning,
timestamp: -576460712978320952,
previous_state: :unknown,
previous_timestamp: -576460751417398083
}
@spec subscribe_all() :: :ok
Subscribe to alarm status events for all alarms
See subscribe/1 for the event format.
@spec unsubscribe(alarm_pattern()) :: :ok
Unsubscribe the current process from the specified alarm :set and :clear events
@spec unsubscribe_all() :: :ok
Unsubscribe from alarm status events for all alarms
NOTE: This will only remove subscriptions created via subscribe_all/0, not
subscriptions created for individual alarms via subscribe/1.