name: lockstep description: | Use this skill when the user wants to find or verify concurrency bugs in BEAM code (Elixir, Erlang, Gleam) using Lockstep, the controlled-scheduling test framework. Triggers include: "test for races", "verify this is concurrency-safe", "find the schedule that causes this flaky test", "why does this fail occasionally", or any mention of GenServer/ETS/atomics race conditions.
Lockstep — controlled concurrency testing for the BEAM
Lockstep runs an ExUnit test body many times with different
message-passing schedules. When it finds a bug, the schedule is
deterministic and replayable — same seed + same iterations
always produce the same trace.
When to use Lockstep
| Situation | Use Lockstep? |
|---|---|
| "This test fails 1 in 50 runs in CI" | Yes — Lockstep gives you the schedule |
| "I'm worried about a TOCTTOU race" | Yes — POS strategy is good at these |
| "Two GenServers race over shared state" | Yes |
| "Is this single-pure-function correct?" | No — use property tests |
| "I have a logic bug" | No — use careful code review |
| "Test under network partition" | Yes — Lockstep.Cluster.partition/3 |
Lockstep's strength: schedule-dependent bugs where standard
testing finds them rarely or not at all. Lockstep gives you a
reproducible counterexample with a seed — re-running with the
same seed always reproduces.
Adding to a project
# mix.exs
defp deps do
[{:lockstep, "~> 0.1.0", only: :test}]
endWriting your first Lockstep test
The simplest pattern: take an existing ExUnit test and rewrite the
body to use Lockstep wrappers, then wrap it with Lockstep.Test.
defmodule MyApp.RaceTest do
use Lockstep.Test
defmodule Counter do
use Lockstep.GenServer # NOTE: Lockstep.GenServer, not GenServer
def start_link, do: Lockstep.GenServer.start_link(__MODULE__, 0)
def value(pid), do: Lockstep.GenServer.call(pid, :value)
def add(pid, n), do: Lockstep.GenServer.call(pid, {:add, n})
def init(state), do: {:ok, state}
def handle_call(:value, _, n), do: {:reply, n, n}
def handle_call({:add, n}, _, total), do: {:reply, :ok, total + n}
end
ctest "two clients adding 1 each end at 2" do
{:ok, pid} = Counter.start_link()
parent = self()
for _ <- 1..2 do
Lockstep.spawn(fn ->
# The "buggy" RMW: read value, then add 1
# If both read 0 before either writes, both write 1, total = 1.
v = Counter.value(pid)
Counter.add(pid, v + 1 - v) # spelled out to be obviously buggy
Lockstep.send(parent, :done)
end)
end
for _ <- 1..2, do: Lockstep.recv_first(fn :done -> true; _ -> false end)
final = Counter.value(pid)
if final != 2, do: raise "lost update; counter is #{final}"
end
endRun with:
mix test path/to/race_test.exs
If Lockstep finds a bug, you'll see:
** (Lockstep.BugFound)
Lockstep found a concurrency bug on iteration 4.
seed: 1
strategy: :pct
trace path: traces/<test-name>-iter4-seed1.lockstep
Schedule:
step 1 hello P0(root)
step 2 spawn P0(root) -> P1
...
step 14 exit P0(root) reason={...} <-- FAILED HERE
Replay with:
mix lockstep.replay --trace traces/<test-name>-iter4-seed1.lockstepStrategy choice
:pct(default) — Probabilistic Concurrency Testing. Best for coarse-grained interleaving exploration.:pos— Probabilistic Operating System. Best for tight read-modify-write races on shared atomics/ETS.:fair_pct— PCT then random; protects against starvation in spin loops.:random— Pure random scheduling. Baseline.
ctest "race", strategy: :pos, iterations: 1000 do
# ...
endOTP wrappers — drop-in replacements
| OTP module | Lockstep equivalent |
|---|---|
GenServer | Lockstep.GenServer |
:gen_statem | Lockstep.GenStatem |
Agent, Task, Task.Supervisor | Lockstep.{Agent,Task,Task.Supervisor} |
Registry, Supervisor | Lockstep.{Registry,Supervisor} |
send/2, spawn/1, Process.send_after/3 | Lockstep.{send,spawn,send_after} |
:ets.{insert,lookup,update_counter} | Lockstep.ETS.* |
:atomics.*, :persistent_term.* | Lockstep.{Atomics,PersistentTerm}.* |
The semantic of every wrapper is identical to the underlying OTP function — Lockstep just inserts a sync point so the strategy can interleave between operations.
Replay + shrink
When Lockstep finds a bug, it writes a .lockstep trace file. You
can:
# Re-execute the exact failing schedule (deterministic)
mix lockstep.replay --trace traces/<bug>.lockstep
# Minimize the trace to the smallest reproducing schedule
mix lockstep.shrink --trace traces/<bug>.lockstep
Replay lets you attach a debugger / add IO.inspect and step
through the race. Shrinking turns a 5000-step trace into 12 steps.
Multi-node testing (Lockstep.Cluster)
For testing distributed systems:
ctest "partition + heal" do
[a, b, c] = Lockstep.Cluster.start_nodes([:a, :b, :c])
Lockstep.Cluster.run(a, fn -> MyService.start_link() end)
Lockstep.Cluster.run(b, fn -> MyService.start_link() end)
Lockstep.Cluster.run(c, fn -> MyService.start_link() end)
Lockstep.Cluster.partition([a, b], [c], mode: :defer)
# ... do work in each partition ...
Lockstep.Cluster.heal()
# Verify convergence
endAlso: Lockstep.Cluster.stop_node/1 and start_node/1 for
crash/recovery scenarios.
What Lockstep does NOT do
- It doesn't find logic bugs visible by reading source. Use code review.
- It doesn't replace property-based testing. They're complementary.
- It doesn't simulate disk fsync, network packet drops at the byte-level, or OS-level kill -9 (use chaos engineering for those).
- It doesn't model wall-clock-tight latency requirements.
Common patterns to recognize
These are the bug shapes Lockstep finds well:
TOCTTOU (read-then-act)
def try_acquire({ref, limit}) do
current = :atomics.get(ref, 1) # T = check
if current < limit do
:atomics.add(ref, 1, 1) # O = of-use; another caller can squeeze in
:ok
else
{:error, :limit_exceeded}
end
end→ test 4+ concurrent callers; under POS, found at iteration ~1.
Lost update on read-modify-write
v = Counter.value(pid)
Counter.set(pid, v + 1)→ test multiple concurrent processes; under POS/PCT, found at iteration 1-3.
Message-ordering race
A NeighborReply arrives at the same mailbox as a connection_lost signal. Whichever is processed first determines outcome.
→ Lockstep.send + Process.monitor + handle_info patterns.
Linearizability violations
Use Lockstep.Checker.Linearizable:
ctest "registry is linearizable" do
history = run_workload(...)
assert :ok = Lockstep.Checker.Linearizable.check(history, model)
endReading traces
Trace output uses pid aliases for readability:
P0(root) = #PID<0.123.0>
P1 = #PID<0.124.0>
P2 = #PID<0.125.0>
Schedule:
step 1 hello P0(root)
step 2 spawn P0(root) -> P1
step 3 send P0(root) -> P1 {:hello}
step 4 recv P1 {:hello}
step 5 exit P1 reason={:exception, :error, ...} <-- FAILED HEREThe <-- FAILED HERE marker shows where the assertion or
invariant fired. Read the trace bottom-up to understand causality.
Causal slice
Lockstep automatically slices traces to show only events causally
related to the failure. Set LOCKSTEP_NO_CAUSAL_SLICE=1 to disable.
For long traces, the slice is ~5-20% of the original.
LLM-explained counterexamples
If you have an Anthropic API key:
export ANTHROPIC_API_KEY=sk-ant-...
mix test # Failures will be explained in plain English
Set LOCKSTEP_LLM_OFF=1 to disable.
Documentation
- Overview + tutorials:
README.md - Methodology:
METHODOLOGY.md— playbook for testing real Hex packages - Bug-finding case studies:
docs/design/BUG_FINDINGS.md - Design history + strategy:
docs/design/
Programmatic API
For non-ExUnit callers:
Lockstep.Runner.run(
fn -> my_test_body() end,
iterations: 1000,
strategy: :pos,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
suite: "my_suite"
)Returns :ok or raises Lockstep.BugFound with iteration, seed,
strategy, and trace path.
When to escalate
If a bug is reproducible with seed: N but doesn't reproduce with
other seeds, the seed matters — file the bug with that exact
seed. Maintainers can verify with the same seed and fix-cycle is
deterministic.
If a test ALWAYS produces a "bug" at iteration 1 but you don't believe it's a real bug, the test scenario is likely too aggressive (forcing the race deterministically rather than testing whether the race exists). Restructure the test to be more "natural" usage.