Lifeline Plugin View Source

🌟 This plugin is available through Oban.Pro

The Lifeline plugin uses producer records to periodically rescues orphaned jobs, i.e. jobs that are stuck in the executing state because the node was shut down before the job could finish.

Without the Lifeline plugin you may need to manually rescue jobs stuck in the executing state.

Using and Configuring

To use the Lifeline plugin add the module to your list of Oban plugins in config.exs:

config :my_app, Oban,
  plugins: [Oban.Pro.Plugins.Lifeline]
  ...

There isn't any configuration necessary. By default, the plugin will delete outdated producer records and rescue orphaned jobs every 1 minute. If necessary you can configure the rescue interval:

plugins: [{Oban.Pro.Plugins.Lifeline, rescue_interval: :timer.minutes(5)}]

Note that rescuing orphans relies on producer records as used by the SmartEngine.

Rescuing Exhausted Jobs

When a job's attempt matches its max_attempts its retries are considered "exhausted". Normally, the Lifeline plugin transitions exhausted jobs to the discarded state and they won't be retried again. It does this for a couple of reasons:

  1. To ensure at-most-once semantics. If a long running job interacted with a non idempotent service and was shut down while waiting for a reply you may not want that jot to retry.
  2. To prevent infinitely crashing BEAM nodes. Poorly behaving jobs may crash the node (through NIFs, memory exhaustion, etc.) We don't want to repeatedly rescue and rerun a job that repeatedly crashes the entire node.

Discarding exhausted jobs may not always be desired. Use the retry_exhausted option if you'd prefer to retry exhausted jobs when they are rescued, rather than discarding them:

plugins: [{Oban.Pro.Plugins.Lifeline, retry_exhausted: true}]

During rescues, with retry_exhausted: true, a job's max_attempts is incremented and it is moved back to the available state.

Implementation Notes

Some additional notes about how Lifeline operates:

  • Orphan rescuing is guaranteed to only rescue jobs that belong to dead queue processes or nodes.

  • Only a single node will rescue orphans at any given time, which prevents potential deadlocks and churn.

Instrumenting with Telemetry

The Lifeline plugin adds the following metadata to the [:oban, :plugin, :stop] event:

  • :action:rescue.
  • :deleted_count — the number of producers deleted
  • :rescued_count — the number of jobs rescued

See the docs on Plugin Events for details.