Dynamic Pruning Plugin

🌟 This plugin is available through Oban.Pro

The DynamicPruner plugin enhances the default FixedPruner plugin's behaviour by allowing you to customize how long or how many jobs are retained in the jobs table. Where the FixedPruner will only retain jobs for 60 seconds, with the DynamicPruner you can specify either a maximum age or a maximum length and provide custom rules for specific queues, workers and job states.

Using and Configuring

To start using the DynamicPruner add the module to your list of Oban plugins in config.exs:

config :my_app, Oban,
  plugins: [Oban.Pro.Plugins.DynamicPruner]
  ...

Without any additional options the pruner operates in maximum length mode (max_len) and retains a conservative 1,000 completed or discarded jobs. To increase the number of jobs retained you can provide your own mode configuration:

plugins: [{Oban.Pro.Plugins.DynamicPruner, mode: {:max_len, 50_000}}]

Now the pruner will retain the most recent 50,000 jobs instead.

A fixed limit on the number of jobs isn't always ideal. Often you want to retain jobs based on their age instead. For example, if you your application needs to ensure that a duplicate job hasn't been enqueued within the past 24 hours you need to retain jobs for at least 24 hours; a fixed limit simply won't work. For that we can use maximum age (max_age) mode instead:

plugins: [{Oban.Pro.Plugins.DynamicPruner, mode: {:max_age, 60 * 60 * 48}}]

Here we've specified max_age using seconds, where 60 * 60 * 48 is the number of seconds in two days.

Calculating the number of seconds in a period isn't especially readable, particularly when you have numerous max_age declarations in overrides (see below). For clarity you can specify the age's time unit as :second, :minute, :hour, :day or :month. Here is the same 48 hour configuration from above, but specified in terms of days:

plugins: [{Oban.Pro.Plugins.DynamicPruner, mode: {:max_age, {2, :days}}}]

Now you can tell exactly how long jobs should be retained, without reverse calculating how many seconds an expression represents.

Providing Overrides

The mode option is indiscriminate when determining which jobs to prune. It pays no attention to which queue they are in, what worker the job is for, or which state they landed in. The DynamicPruner allows you to specify per-queue, per-worker and per-state overrides that fine tune pruning.

We'll start with a simple example of limiting the total number of retained jobs in the events queue:

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  mode: {:max_age, {7, :days}},
  queue_overrides: [events: {max_len: 1_000}]
}]

With this configuration most jobs will be retained for seven days, but we'll only keep the latest 1,000 jobs in the events queue. We can extend this further and override all of our queues (and omit the default mode entirely):

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  queue_overrides: [
    default: {:max_age, {6, :hours}},
    analysis: {:max_age, {1, :day}},
    events: {:max_age, {10, :minutes}},
    mailers: {:max_age, {2, :weeks}},
    media: {:max_age, {2, :months}}
  ]
}]

When pruning by queue isn't granular enough you can provide overrides by worker instead:

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  worker_overrides: [
    "MyApp.BusyWorker": {:max_age, {1, :day}},
    "MyApp.SecretWorker": {:max_age, {1, :second}},
    "MyApp.HistoricWorker": {:max_age, {1, :month}}
  ]
}]

You can also override by state, which allows you to keep discarded jobs for inspection while quickly purging cancelled or successfully completed jobs:

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  state_overrides: [
    cancelled: {:max_age, {1, :hour}},
    completed: {:max_age, {1, :day}},
    discarded: {:max_age, {1, :month}}
  ]
}]

Naturally you can mix and match overrides to finely control job retention:

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  mode: {:max_age, {7, :days}},
  queue_overrides: [events: {:max_age, {10, :minutes}}],
  worker_overrides: ["MyApp.SecretWorker": {:max_age, {1, :second}}],
  state_overrides: [discarded: {:max_age, {2, :days}}]
}]

Keeping Up With Inserts

With the default settings the DynamicPruner will only delete 10,000 jobs each time it prunes. The limit exists to prevent connection timeouts and excessive table locks. A busy system can easily insert more than 10,000 jobs per minute during standard operation. If you find that jobs are accumulating despite active pruning you can override the limit.

Here we set the delete limit to 25,000:

plugins: [{
  Oban.Pro.Plugins.DynamicPruner,
  mode: {:max_len, 100_000},
  limit: 25_000
}]

Deleting in PostgreSQL is very fast, and the 10k default is rather conservative. Feel free to increase the limit to a number that your system can handle.

Implementation Notes

Some additional notes about pruning in general and nuances of the DynamicPruner plugin:

  • Pruning is best-effort and performed out-of-band. This means that all limits are soft; jobs beyond a specified age may not be pruned immediately after jobs complete.

  • Pruning is only applied to jobs that are completed or discarded (has reached the maximum number of retries or has been manually killed). It'll never delete a new job, a scheduled job or a job that will be retried.

  • Only a single node will prune at any given time, which prevents potential deadlocks between transactions.

Instrumenting with Telemetry

The DynamicPruner plugin uses Oban.Telemetry.span/3 to emit standardized plugin events.

eventmetadata
[:oban, :plugin, :start]:plugin
[:oban, :plugin, :stop]:plugin, :duration
[:oban, :plugin, :exception]:plugin, :duration, :kind, :reason, :stacktrace