Scaling BullMQ Elixir
View SourceThis guide explains how to scale BullMQ workers in Elixir, comparing it with Node.js patterns and showing best practices for production deployments.
Elixir vs Node.js Scaling Model
Node.js Architecture
In Node.js, each worker runs in a single-threaded event loop. To utilize multiple CPU cores, you need multiple OS processes:
Machine (8 cores)
┌─────────────────────────────────────────────────────┐
│ Process 1 Process 2 Process 3 Process 4 │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Worker │ │ Worker │ │ Worker │ │ Worker │ │
│ │ 1 thread│ │ 1 thread│ │ 1 thread│ │ 1 thread│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ Process 5 Process 6 Process 7 Process 8 │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Worker │ │ Worker │ │ Worker │ │ Worker │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────┘
8 OS processes = 8 workers- Each worker = 1 OS process (~30-50MB memory)
- Managed by PM2, cluster module, or container orchestration
- No shared memory between workers
Elixir Architecture
Elixir runs on the BEAM VM, which automatically uses all CPU cores with lightweight processes:
Machine (8 cores)
┌─────────────────────────────────────────────────────┐
│ Single BEAM VM Process │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Scheduler│ │Scheduler│ │Scheduler│ │Scheduler│ │
│ │ Core 1 │ │ Core 2 │ │ Core 3 │ │ Core 4 │ │
│ │┌───────┐│ │┌───────┐│ │┌───────┐│ │┌───────┐│ │
│ ││Worker1││ ││Worker3││ ││Worker5││ ││Worker7││ │
│ ││Worker2││ ││Worker4││ ││Worker6││ ││Worker8││ │
│ │└───────┘│ │└───────┘│ │└───────┘│ │└───────┘│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Scheduler│ │Scheduler│ │Scheduler│ │Scheduler│ │
│ │ Core 5 │ │ Core 6 │ │ Core 7 │ │ Core 8 │ │
│ │ ... │ │ ... │ │ ... │ │ ... │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────┘
1 OS process, 8 schedulers, many workers- Single OS process utilizes ALL cores
- Workers are lightweight (~2KB each)
- Can have thousands of workers in one VM
- Schedulers automatically distribute work
Key Differences
| Aspect | Node.js | Elixir |
|---|---|---|
| Process per core | Required | Not needed |
| Memory per worker | ~30-50MB | ~2KB |
| Max workers/machine | ~CPU cores | Thousands |
| Inter-worker communication | IPC/Redis | Direct message passing |
| Scaling complexity | Higher (process management) | Lower (just add workers) |
Scaling Strategies
1. Vertical Scaling (Single Machine)
In Elixir, you scale vertically by spawning more workers within the same application:
defmodule MyApp.Application do
use Application
def start(_type, _args) do
# Scale based on CPU cores
num_workers = System.schedulers_online() * 2 # 2 workers per core
workers = for i <- 1..num_workers do
Supervisor.child_spec(
{BullMQ.Worker,
queue: "jobs",
connection: :redis,
concurrency: 500,
processor: &MyApp.JobProcessor.process/1},
id: :"worker_#{i}"
)
end
children = [
{BullMQ.RedisConnection, name: :redis, host: "localhost"}
| workers
]
Supervisor.start_link(children, strategy: :one_for_one)
end
endBenchmark Results (from our tests):
| Workers | Concurrency | Throughput |
|---|---|---|
| 1 | 500 | ~4,100 j/s |
| 5 | 500 | ~12,400 j/s |
| 10 | 500 | ~16,500 j/s |
2. Horizontal Scaling (Multiple Machines)
For horizontal scaling, deploy the same application to multiple machines:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Machine 1 │ │ Machine 2 │ │ Machine 3 │
│ BEAM VM │ │ BEAM VM │ │ BEAM VM │
│ 10 workers │ │ 10 workers │ │ 10 workers │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌──────▼──────┐
│ Redis │
└─────────────┘Each machine runs independently. BullMQ's Lua scripts ensure atomic job distribution - no coordination between machines needed.
3. Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: bullmq-workers
spec:
replicas: 5 # 5 pods
template:
spec:
containers:
- name: worker
image: myapp:latest
resources:
requests:
cpu: '2'
memory: '512Mi'
limits:
cpu: '4'
memory: '1Gi'
env:
- name: WORKER_COUNT
value: '8' # 8 workers per pod
- name: CONCURRENCY
value: '500'# In your application
num_workers = String.to_integer(System.get_env("WORKER_COUNT", "4"))
concurrency = String.to_integer(System.get_env("CONCURRENCY", "500"))Supervision Tree
BullMQ Elixir uses OTP supervision for fault tolerance:
Application Supervisor
├── Registry (for named processes)
├── DynamicSupervisor (WorkerSupervisor)
│ └── Worker 1
│ └── Worker 2
│ └── ...
└── DynamicSupervisor (QueueEventsSupervisor)
└── QueueEvents listeners
Worker (GenServer)
├── LockManager (linked GenServer)
│ └── Single timer for lock renewal
└── Job Task processesFault Tolerance
If a Worker crashes:
- Supervisor restarts it automatically
- Active jobs may become stalled (picked up after stalled check interval)
- Other workers continue processing
If LockManager crashes:
- Worker is terminated (linked process)
- Supervisor restarts the Worker
- Worker creates new LockManager on restart
If a Job Task crashes:
- Job is moved to failed (if no retries) or delayed (for retry)
- Worker continues processing other jobs
Best Practices
- Always use supervisors:
# Good - supervised
children = [
{BullMQ.Worker, queue: "jobs", ...}
]
Supervisor.start_link(children, strategy: :one_for_one)
# Avoid - unsupervised
{:ok, worker} = BullMQ.Worker.start_link(queue: "jobs", ...)- Use restart strategies appropriately:
# For workers that should always run
Supervisor.child_spec(
{BullMQ.Worker, opts},
restart: :permanent # Always restart (default)
)
# For temporary workers
Supervisor.child_spec(
{BullMQ.Worker, opts},
restart: :temporary # Never restart
)- Set appropriate max_restarts:
Supervisor.start_link(children,
strategy: :one_for_one,
max_restarts: 10, # Max 10 restarts
max_seconds: 60 # Within 60 seconds
)Dynamic Worker Management
For dynamic scaling based on load:
defmodule MyApp.WorkerManager do
use GenServer
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
def scale_up(count \\ 1) do
GenServer.call(__MODULE__, {:scale_up, count})
end
def scale_down(count \\ 1) do
GenServer.call(__MODULE__, {:scale_down, count})
end
def worker_count do
GenServer.call(__MODULE__, :worker_count)
end
@impl true
def init(opts) do
{:ok, %{
queue: Keyword.fetch!(opts, :queue),
connection: Keyword.fetch!(opts, :connection),
processor: Keyword.fetch!(opts, :processor),
concurrency: Keyword.get(opts, :concurrency, 500),
workers: []
}}
end
@impl true
def handle_call({:scale_up, count}, _from, state) do
new_workers = for _ <- 1..count do
{:ok, pid} = DynamicSupervisor.start_child(
BullMQ.WorkerSupervisor,
{BullMQ.Worker,
queue: state.queue,
connection: state.connection,
concurrency: state.concurrency,
processor: state.processor}
)
pid
end
{:reply, :ok, %{state | workers: state.workers ++ new_workers}}
end
@impl true
def handle_call({:scale_down, count}, _from, state) do
{to_stop, to_keep} = Enum.split(state.workers, count)
Enum.each(to_stop, fn pid ->
BullMQ.Worker.close(pid)
DynamicSupervisor.terminate_child(BullMQ.WorkerSupervisor, pid)
end)
{:reply, :ok, %{state | workers: to_keep}}
end
@impl true
def handle_call(:worker_count, _from, state) do
{:reply, length(state.workers), state}
end
endOptimization Guidelines
Workers per Machine
Rule of thumb: Start with 2× CPU cores for I/O-bound jobs
num_workers = System.schedulers_online() * 2For CPU-bound jobs, use 1× CPU cores to avoid context switching overhead.
Concurrency per Worker
Sweet spot: 200-500 concurrent jobs per worker
Above 500, you hit diminishing returns due to sequential job fetching from Redis.
Total Capacity Formula
Throughput ≈ num_workers × ~4,000 j/s (for instant jobs)
Throughput ≈ num_workers × concurrency / avg_job_time (for real jobs)Example: 10 workers × 500 concurrency with 10ms jobs:
- Theoretical max: 10 × 500 / 0.01 = 500,000 j/s
- Actual (with Redis overhead): ~40,000-50,000 j/s
Memory Considerations
Each worker + LockManager uses minimal memory (~100KB overhead + job data). The main memory consumers are:
- Job data in flight
- Task processes for concurrent jobs
Estimate: base_memory + (concurrency × avg_job_memory)
Monitoring
Use Telemetry events for monitoring worker health:
:telemetry.attach_many(
"worker-monitor",
[
[:bullmq, :job, :completed],
[:bullmq, :job, :failed],
[:bullmq, :worker, :stalled]
],
fn event, measurements, metadata, _config ->
# Send to your metrics system
StatsD.increment("bullmq.#{event}")
end,
nil
)Summary
- Start simple: One BEAM VM per machine with multiple workers
- Scale workers first: Add more workers before adding machines
- Use supervisors: Always run workers under supervision
- Monitor and tune: Use telemetry to find optimal worker/concurrency settings
- Scale horizontally last: Add machines when single machine is saturated
The key advantage of Elixir is that you get multi-core utilization "for free" - no process managers, no cluster modules, just spawn more workers in the same application.