Alarmist (alarmist v0.4.0)

View Source

Alarm handler and more

Alarmist provides an :alarm_handler implementation that allows you to check what alarms are currently active and subscribe to alarm status changes.

It also provides a DSL for defining alarms based on other alarms. See Alarmist.Alarm.

Summary

Types

Alarm information

Alarm description

Alarm identifier

Patterns for alarm subscriptions

Alarm state

Alarm type

Remedy callback with or without options

Callback function for fixing alarms

Options for running the remedy callback

Functions

Add a managed alarm

Add a callback to fix an Alarm ID

Get the current state of an alarm

Extract the alarm type from an alarm ID

Clear knowledge of an alarm's level

Return a list of all active alarm IDs

Return a list of all active alarms

Print alarm status in a nice table

Return all managed alarm IDs

Remove a managed alarm

Remove a remedy callback

Set or change the alarm level for an alarm

Subscribe to alarm status events

Subscribe to alarm status events for all alarms

Unsubscribe the current process from the specified alarm :set and :clear events

Unsubscribe from alarm status events for all alarms

Types

alarm()

@type alarm() :: {alarm_id(), alarm_description()}

Alarm information

Calls to :alarm_handler.set_alarm/1 pass an alarm identifier and description as a 2-tuple. Alarmist stores the description of the most recent call.

:alarm_handler.set_alarm/1 doesn't enforce the use of 2-tuples. Alarmist normalizes non-2-tuple alarms so that they have empty descriptions.

alarm_description()

@type alarm_description() :: any()

Alarm description

This is optional supplemental information about the alarm. It could contain more information about why it was set. Don't use it to differentiate between alarms. Use the alarm ID for that.

alarm_id()

@type alarm_id() ::
  alarm_type()
  | {alarm_type(), any()}
  | {alarm_type(), any(), any()}
  | {alarm_type(), any(), any(), any()}

Alarm identifier

Alarm identifiers are the unique identifiers of each alarm that can be set or cleared.

While SASL alarm identifiers can be anything, Alarmist supplies conventions so that it can interpret them. This typespec follows those conventions, but you may come across codes that doesn't. Those cases may be ignored or misinterpreted.

alarm_pattern()

@type alarm_pattern() ::
  alarm_type()
  | :_
  | {alarm_type() | :_, any() | :_}
  | {alarm_type() | :_, any() | :_, any() | :_}

Patterns for alarm subscriptions

Patterns can be exact matches or use :_ to match any value in a position.

alarm_state()

@type alarm_state() :: :set | :clear | :unknown

Alarm state

Alarms are in the :set state after a call to :alarm_handler.set_alarm/1 and in the :clear state after a call to :alarm_handler.clear_alarm/1. Redundant calls to :alarm_handler.set_alarm/1 update the alarm description and redundant calls to :alarm_handler.clear_alarm/1 are ignored.

The :unknown state is used for alarms that are unknown to Alarmist. These alarms may have typos in the names or they simply may not have been set or cleared yet.

alarm_type()

@type alarm_type() :: atom()

Alarm type

Alarm types are atoms and for Alarmist-managed alarms, they are module names.

compiled_condition()

@type compiled_condition() :: %{
  rules: [rule()],
  temporaries: [alarm_id()],
  options: map()
}

info_options()

@type info_options() :: [
  level: Logger.level(),
  sort: :level | :alarm_id | :duration,
  ansi_enabled?: boolean()
]

See Alarmist.info/1

remedy()

@type remedy() :: remedy_fn() | {remedy_fn(), remedy_options()}

Remedy callback with or without options

See Alarmist.Alarm.__using__/1

remedy_fn()

@type remedy_fn() :: (-> any()) | (alarm_id() -> any()) | mfa()

Callback function for fixing alarms

This may be an MFA or function reference that takes zero or one arguments. If it takes one argument, the alarm ID is passed.

remedy_options()

@type remedy_options() :: [retry_timeout: timeout(), callback_timeout: timeout()]

Options for running the remedy callback

  • :retry_timeout — time to wait for the alarm to be cleared before calling the callback again (default: :infinity)
  • :callback_timeout — time to wait for the callback to run (default: 60 seconds)

rule()

@opaque rule()

Functions

add_managed_alarm(alarm_id)

@spec add_managed_alarm(alarm_id()) :: :ok

Add a managed alarm

After this call, Alarmist will watch for alarms to be set based on the supplied module and set or clear the specified alarm ID. The module must use Alarmist.Alarm.

Calling this function a multiple times with the same alarm results in the previous alarm being replaced. Alarm subscribers won't receive redundant events if the rules are the same.

add_remedy(alarm_id, callback, options \\ [])

@spec add_remedy(alarm_id(), remedy_fn(), remedy_options()) :: :ok | {:error, atom()}

Add a callback to fix an Alarm ID

This is a simple way of adding a callback function to deal with an alarm being set. Conceptually it is similar to starting a GenServer, calling subscribe/1, and running the callback on alarm set messages. It provides a number of conveniences:

  • Supervision is handled for you. If the callback crashes, you'll get a message in the log, but it won't prevent future attempts
  • Handles fast toggling of alarm states to prevent the callback runs from queuing or running concurrently
  • Can repeatedly call the callback after a retry delay for alarms that aren't clearing
  • Times out hung callbacks to allow for future invocations without violating the guarantee that only one callback is run for an alarm ID at any one time.

Only one remedy callback can be registered per alarm ID. If you are running the remedy on a managed alarm, see Alarmist.Alarm for specifying it there and the remedy callback will be automatically added when the managed alarm is.

Options:

  • :retry_timeout — time to wait for the alarm to be cleared before calling the callback again (default: :infinity)
  • :callback_timeout — time to wait for the callback to run (default: 60 seconds)

Since there can only be one remedy per Alarm ID, subsequent calls replace. If an alarm is already set, the new callback will be called the next time. This means that crash/restarts of the process that adds the remedy does not cause the callback to be invoked twice. In fact, if the callback and options are the same, it will look like a no-op. If you don't want this behavior, call remove_remedy/1 and then add_remedy/3 to force new calls to be made.

alarm_state(alarm_id)

@spec alarm_state(alarm_id()) :: alarm_state()

Get the current state of an alarm

Alarms get known by Alarmist when they're first set or cleared.

alarm_type(alarm_id)

@spec alarm_type(alarm_id()) :: alarm_type()

Extract the alarm type from an alarm ID

Examples:

iex> Alarmist.alarm_type(MyAlarm)
MyAlarm
iex> Alarmist.alarm_type({NetworkBroken, "eth0"})
NetworkBroken

clear_alarm_level(alarm_id)

@spec clear_alarm_level(alarm_id()) :: :ok

Clear knowledge of an alarm's level

If the alarm gets reported after this call, it will be assigned the default alarm level, :warning.

get_alarm_ids(options \\ [])

@spec get_alarm_ids([{:level, Logger.level()}]) :: [alarm_id()]

Return a list of all active alarm IDs

Options:

  • :level - filter alarms by severity. Defaults to :info.

get_alarms(options \\ [])

@spec get_alarms([{:level, Logger.level()}]) :: [alarm()]

Return a list of all active alarms

This returns {id, description} tuples. Note that Alarmist normalizes alarms that were not set as 2-tuples so this may not match calls to :alarm_handler.set_alarm/1.

Options:

  • :level - filter alarms by severity. Defaults to :info.

info(options \\ [])

@spec info(info_options()) :: :ok

Print alarm status in a nice table

Options:

  • :ansi_enabled? - override the default ANSI setting. Defaults to true.
  • :level - filter alarms by severity. Defaults to :info.
  • :show_cleared? - show cleared alarms. Defaults to false.

is_alarm_id(id)

(macro)

managed_alarm_ids(timeout \\ 5000)

@spec managed_alarm_ids(timeout()) :: [alarm_id()]

Return all managed alarm IDs

remove_managed_alarm(alarm_id)

@spec remove_managed_alarm(alarm_id()) :: :ok

Remove a managed alarm

remove_remedy(alarm_id)

@spec remove_remedy(alarm_id()) :: :ok | {:error, :not_found}

Remove a remedy callback

If the callback is currently running, Alarmist brutally kills its worker process.

There's generally no need to remove a remedy callback that's automatically added as part of a managed alarm. Removing the managed alarm removes its remedy.

set_alarm_level(alarm_id, level)

@spec set_alarm_level(alarm_id(), Logger.level()) :: :ok

Set or change the alarm level for an alarm

The alarm can be either for a managed or unmanaged alarm. Once set, that alarm will be reported with the specified level.

While this can be used with managed alarms, you should normally pass the desired level as an option to use Alarmist.Alarm so that it's handled for you.

It's also possible to set levels for unmanaged alarms in the application configuration:

config :alarmist, alarm_levels: %{MyUnmanagedAlarm => :critical}

NOTE: Changing the alarm level does not change the status of existing alarms since there's no mechanism to go back in time and change reports. Future events will be reported with the new level.

subscribe(alarm_pattern)

@spec subscribe(alarm_pattern()) :: :ok

Subscribe to alarm status events

Events will be delivered to the calling process as:

%Alarmist.Event{
  id: TheAlarmId,
  state: :set,
  description: nil,
  level: :warning,
  timestamp: -576460712978320952,
  previous_state: :unknown,
  previous_timestamp: -576460751417398083
}

subscribe_all()

@spec subscribe_all() :: :ok

Subscribe to alarm status events for all alarms

See subscribe/1 for the event format.

unsubscribe(alarm_pattern)

@spec unsubscribe(alarm_pattern()) :: :ok

Unsubscribe the current process from the specified alarm :set and :clear events

unsubscribe_all()

@spec unsubscribe_all() :: :ok

Unsubscribe from alarm status events for all alarms

NOTE: This will only remove subscriptions created via subscribe_all/0, not subscriptions created for individual alarms via subscribe/1.