Lifeline Plugin
🌟 This plugin is available through Oban.Pro
The Lifeline
plugin records queue activity as heartbeats and periodically
rescues orphaned jobs, i.e. jobs that are stuck in the executing
state because
the node was shut down before the job could finish. Without the Lifeline
plugin you need to manually rescue jobs stuck in the executing
state.
Lifeline
must be included used in order for Oban.Web
to run properly.
Using and Configuring
To use the Lifeline
plugin add the module to your list of Oban plugins in
config.exs
:
config :my_app, Oban,
plugins: [Oban.Pro.Plugins.Lifeline]
...
There isn't any configuration necessary. By default the plugin will record heartbeats every 1 second, prune old heartbeats every 5 minutes, and rescue orphaned jobs every 1 minute. If necessary you can configure any or all of those intervals:
plugins: [{
Oban.Pro.Plugins.Lifeline,
delete_interval: :timer.minutes(10),
record_interval: :timer.seconds(10),
rescue_interval: :timer.minutes(5)
}]
This configuration will record 10x fewer heartbeats per minute, retain them 2x as long and attempt to rescue 5x less frequently. It optimizes for less database activity, at the expense of fidelity and recovery speed.
Note that rescuing orphans relies on recent heartbeats. Be sure that the
delete_interval
is always longer than the rescue_interval
or it will look
like all executing
jobs are orphaned.
Rescuing Exhausted Jobs
When a job's attempt
matches its max_attempts
its retries are considered
"exhausted". Normally, the Lifeline
plugin transitions exhausted jobs to the
discarded
state and they won't be retried again. It does this for a couple of
reasons:
- To ensure at-most-once semantics. If a long running job interacted with a non idempotent service and was shut down while waiting for a reply you may not want that jot to retry.
- To prevent infinitely crashing BEAM nodes. Poorly behaving jobs may crash the node (through NIFs, memory exhaustion, etc.) We don't want to repeatedly rescue and rerun a job that repeatedly crashes the entire node.
Discarding exhausted jobs may not always be desired. Use the retry_exhausted
option if you'd prefer to retry exhausted jobs when they are rescued, rather
than discarding them:
plugins: [{Oban.Pro.Plugins.Lifeline, retry_exhausted: true}]
During rescues, with retry_exhausted: true
, a job's max_attempts
is
incremented and it is moved back to the available
state.
Implementation Notes
Some additional notes about heartbeat records and how you can expect Lifeline
to operate:
Orphan rescuing is guaranteed to only rescue jobs that belong to dead queue processes or nodes.
Heartbeat records are written to the
oban_beats
table very efficiently, as a single batch.Heartbeat records are only retained for five minutes, by default. This prevents bloat or exhausting the row limit on free tier databases.
Only a single node will delete heartbeats or rescue orphans at any given time, which prevents potential deadlocks and churn.
Instrumenting with Telemetry
The Lifeline
plugin adds the following metadata to the [:oban, :plugin, :stop]
event:
:action
— one of:delete
,:record
or:rescue
.:deleted_count
— the number of jobs deleted for the:delete
action:rescued_count
— the number of jobs rescued for the:rescue
action:recorded_count
— the number beats inserted for the:record
action
See the docs on Plugin Events for details.