View Source Scaling Applications

Notifications

Oban uses PubSub notifications for communication between nodes, like job inserts, pausing queues, resuming queues, and metrics for Web. The default notifier is Oban.Notifiers.Postgres, which sends all messages through the database. Postgres' notifications adds up at scale because each one requires a separate query.

If you're clustered, switch to an alternative notifier like Oban.Notifiers.PG. That keeps notifications out of the db, reduces total queries, and allows larger messages. As long as you have a functional Distributed Erlang cluster, then it’s a single line change to your Oban config.

 config :my_app, Oban,
+  notifier: Oban.Notifiers.PG,

If you're not clustered, consider using Oban.Notifiers.Phoenix to send notifications through an alternative service like Redis.

Triggers

Inserting jobs emits a trigger notification to let queues know there are jobs to process immediately, without waiting up to 1s for the next polling interval. Triggers may create many notifications for active queues.

Evaluate if you need sub-second job dispatch. Without it, jobs may wait up to 1s before running, but that’s not a concern for busy queues since they’re constantly fetching and dispatching.

Disable triggers in your Oban configuration:

 config :my_app, Oban,
+  insert_trigger: false,

Uniqueness

Frequently, people set uniqueness for jobs that don’t really need it. Not you, of course. Before setting uniqueness, ensure the following, in a very checklist type fashion:

Evaluate whether it’s necessary for your workload
Always set a keys option so that uniqueness isn’t based on the full args or meta
Avoid setting a period at all if possible, use period: :infinity instead

If you're still committed to setting uniquness for your jobs, consider tweaking your configuration as follows:

use Oban.Worker, unique: [
-   period: {1, :hour},
+   period: :infinity,
+   keys: [:some_key]

🌟 Pro Uniqueness
Oban Pro uses an alternative mechanism for unique jobs that works for bulk inserts, and is designed for speed, correctness, scalability, and simplicity. Uniqueness is enforced and makes insertion entirely safe between processes and nodes, without the load added by multiple queries.

Reindexing

To stop oban_jobs indexes from taking up so much space on disk, use the Oban.Plugins.Reindexer plugin to rebuild indexes periodically. The Postgres transactional model applies to indexes as well as tables. That leaves bloat from inserting, updating, and deleting jobs that auto-vacuuming won’t always fix.

The reindexer rebuilds key indexes on a fixed schedule, concurrently. Concurrent rebuilds are low impact, they don’t lock the table, and they free up space while optimizing indexes.

The Oban.Plugins.Reindexer plugin is part of OSS Oban. It runs every day at midnight by default, but it accepts a cron-style schedule and you can tweak it to run less frequently.

config :my_app, Oban,
   plugins: [
+   {Oban.Plugins.Reindexer, schedule: "@weekly"},
    …
   ]

Pruning

Ensuring you are using the Pruner plugin, and that you prune aggressively. Pruning periodically deletes completed, cancelled, and discarded jobs. Your application and database will benefit from keeping the jobs table small. Aim to retain as few jobs as necessary for uniqueness and historic introspection.

For example, to limit historic jobs to 1 day:

 config :my_app, Oban,
  plugins: [
+    {Oban.Plugins.Pruner, max_age: 1_day_in_seconds}
     …
   ]

The default auto vacuum settings are conservative and may fall behind on active tables. Dead tuples accumulate until autovacuum proc comes to mark them as cleanable.

Like indexes, the MVCC system only flags rows for deletion later. Then, those rows are deleted when the auto-vacuum runs. Autovacuum can be tweaked for the oban_jobs table alone.Tune autovacuum for the oban_jobs table.

The exact scale factor tuning will vary based on total rows, table size, and database load.

Below is an example of the possible scale factor and threshold:

ALTER TABLE oban_jobs SET (
  autovacuum_vacuum_scale_factor = 0,
  autovacuum_vacuum_threshold = 100
)

🌟 Partitioning
For extreme load (tens of millions of jobs a day), Oban Pro’s DynamicPartitioner may help. It manages partitioned tables to drop older jobs without any bloat. Dropping tables entirely is instantaneous and leaves zero bloat. Autovacuuming each partition is faster as well.

Pooling

Oban uses connections from your application Repo’s pool to talk to the database. When that pool is busy, it can starve Oban of connections and you’ll see timeout errors. Likewise, if Oban is extremely busy (as it should be), it can starve your application of connections. A good solution for this is to set up another pool that’s exclusively for Oban’s internal use. The dedicated pool isolates Oban’s queries from the rest of the application.

Start by defining a new ObanRepo:

defmodule MyApp.ObanRepo do
   use Ecto.Repo,
     adapter: Ecto.Adapters.Postgres,
     otp_app: :my_app
end

Then switch the configured repo, and use get_dynamic_repo to ensure the same repo is used within a transaction:

 config :my_app, Oban,
-  repo: MyApp.Repo,
+  repo: MyApp.ObanRepo,
+  get_dynamic_repo: fn -> if MyApp.Repo.in_transaction?(), do: MyApp.Repo, else: MyApp.ObanRepo end
   ...

High Concurrency

In a busy system with high concurrency all of the record keeping after jobs run causes pool contention, despite the individual queries being very quick. Fetching jobs uses a single query per queue. However, acking when a job finishes takes a single connection for each job.

Improve the ratio between executing jobs and available connections by scaling up your Ecto pool_size and minimizing concurrency between all queues.

config :my_app, Repo,
-  pool_size: 10,
+  pool_size: 50,
   …

 config :my_app, Oban,
   queues: [
-    events: 200,
+    events: 50,
-    emails: 100,
+    emails: 25,
   …

Using a dedicated pool with a known number of constant connections can also help the ratio. It’s not necessary for most applications, but a dedicated database can help maintain predictable performance.

← Previous Page Preparing for Production

Next Page → Troubleshooting