Functional retry with backoff for Elixir.
Resiliency.BackoffRetry provides a simple retry/2 function that executes a function
and retries on failure using composable, stream-based backoff strategies.
Zero macros, zero processes, injectable sleep for fast tests.
When to use
- Calling an external HTTP API that occasionally returns transient 5xx errors or connection timeouts — retry with exponential backoff to give the service time to recover.
- Writing to a database that may temporarily reject connections under load — retry with a time budget so the caller does not block indefinitely.
- Consuming messages from a queue where processing occasionally fails due to upstream flakiness — retry a bounded number of times before dead-lettering.
- Performing DNS lookups or certificate refreshes at startup where a brief network blip should not crash the application.
How it works
retry/2 executes the given zero-arity function and inspects its return value.
Any {:ok, _} or bare value is treated as success and returned immediately.
Any {:error, _}, raised exception, caught exit, or caught throw is treated as
failure. On failure the optional :retry_if predicate is consulted — if it
returns false, the error is returned at once.
When a retry is warranted, the next delay is pulled from a pre-built list of
delay values. That list is produced by taking max_attempts - 1 elements from
an infinite Stream generated by Resiliency.BackoffRetry.Backoff (exponential,
linear, or constant), each capped at :max_delay. Before sleeping, the
optional :on_retry callback fires, then the configured :sleep_fn is called
with the delay in milliseconds.
A time :budget may be specified. Before each sleep, the engine checks whether
the remaining budget can absorb the upcoming delay. If not, retries stop and the
last error is returned. This provides a hard ceiling on total wall-clock time
independent of the number of attempts. When :reraise is true and the
original failure was a rescued exception, the exception is re-raised with its
original stacktrace once retries are exhausted — useful for letting crash
reporters capture the real origin.
Algorithm Complexity
| Function | Time | Space |
|---|---|---|
retry/2 | O(n) where n = max_attempts — each attempt is O(1) overhead beyond the user function | O(n) — the pre-built delay list holds at most n - 1 elements |
abort/1 | O(1) | O(1) |
Quick start
# Retry with defaults (3 attempts, exponential backoff)
{:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end)
# With options
{:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end,
backoff: :exponential,
max_attempts: 5,
retry_if: fn
{:error, :timeout} -> true
{:error, :econnrefused} -> true
_ -> false
end,
on_retry: fn attempt, delay, error ->
Logger.warning("Attempt #{attempt} failed: #{inspect(error)}")
end
)Options
:backoff—:exponential(default),:linear,:constant, or anyEnumerableof ms:base_delay— initial delay in ms (default:100):max_delay— cap per-retry delay in ms (default:5_000):max_attempts— total attempts including first (default:3):budget— total time budget in ms (default::infinity):retry_if—fn {:error, reason} -> boolean end(default: retries all errors):on_retry—fn attempt, delay, error -> anycallback before sleep:sleep_fn— sleep function, defaults toProcess.sleep/1:reraise—trueto re-raise rescued exceptions with original stacktrace when retries are exhausted (default:false)
Telemetry
All events are emitted in the caller's process. See Resiliency.Telemetry for the
complete event catalogue.
[:resiliency, :retry, :start]
Emitted before the first attempt.
Measurements
| Key | Type | Description |
|---|---|---|
system_time | integer | System.system_time() at emission time |
Metadata
| Key | Type | Description |
|---|---|---|
max_attempts | integer | Configured maximum number of attempts |
[:resiliency, :retry, :stop]
Emitted after the operation completes — either success or exhausted retries (without re-raise).
Measurements
| Key | Type | Description |
|---|---|---|
duration | integer | Elapsed native time units (System.monotonic_time/0 delta) |
Metadata
| Key | Type | Description |
|---|---|---|
max_attempts | integer | Configured maximum number of attempts |
attempts | integer | Actual number of attempts made |
| result | :ok | :error | :ok on success, :error on failure |
[:resiliency, :retry, :exception]
Emitted instead of :stop when reraise: true and a rescued exception exhausts all retries.
Measurements
| Key | Type | Description |
|---|---|---|
duration | integer | Elapsed native time units |
Metadata
| Key | Type | Description |
|---|---|---|
max_attempts | integer | Configured maximum number of attempts |
attempts | integer | Actual number of attempts made |
kind | :error | Always :error (rescued exception) |
reason | Exception.t() | The exception struct |
stacktrace | list | Original exception stacktrace |
[:resiliency, :retry, :retry]
Emitted before each retry sleep, after a failed attempt that will be retried.
Measurements
| Key | Type | Description |
|---|---|---|
delay | integer | Sleep duration in milliseconds before next attempt |
Metadata
| Key | Type | Description |
|---|---|---|
attempt | integer | The attempt number that just failed (1-based) |
error | term | The error that triggered the retry ({:error, reason} form) |
Summary
Functions
Creates an %Abort{} struct to signal immediate retry termination.
Executes fun and retries on failure with configurable backoff.
Types
@type option() :: {:backoff, :exponential | :linear | :constant | Enumerable.t()} | {:base_delay, non_neg_integer()} | {:max_delay, non_neg_integer()} | {:max_attempts, pos_integer()} | {:budget, :infinity | non_neg_integer()} | {:retry_if, (any() -> boolean())} | {:on_retry, (pos_integer(), non_neg_integer(), any() -> any()) | nil} | {:sleep_fn, (non_neg_integer() -> any())} | {:reraise, boolean()}
Functions
@spec abort(any()) :: Resiliency.BackoffRetry.Abort.t()
Creates an %Abort{} struct to signal immediate retry termination.
Parameters
reason-- any term describing why the retry should be aborted.
Returns
A Resiliency.BackoffRetry.Abort.t() struct wrapping the given reason.
Example
Resiliency.BackoffRetry.retry(fn ->
case api_call() do
{:error, :not_found} -> {:error, Resiliency.BackoffRetry.abort(:not_found)}
other -> other
end
end)
Executes fun and retries on failure with configurable backoff.
See the module documentation for available options.
With reraise: true, re-raises rescued exceptions with the original
stacktrace when retries are exhausted instead of returning {:error, exception}.
Parameters
fun-- a zero-arity function to execute. Must return{:ok, value},{:error, reason}, or a bare value (see result normalization in the module docs).opts-- keyword list of options. Defaults to[].:backoff-- backoff strategy::exponential,:linear,:constant, or anyEnumerableof ms. Defaults to:exponential.:base_delay-- initial delay in milliseconds. Defaults to100.:max_delay-- cap per-retry delay in milliseconds. Defaults to5_000.:max_attempts-- total attempts including the first. Defaults to3.:budget-- total time budget in milliseconds. Defaults to:infinity.:retry_if--fn {:error, reason} -> boolean endpredicate controlling whether to retry. Defaults to retrying all errors.:on_retry--fn attempt, delay, error -> anycallback invoked before each sleep. Defaults tonil.:sleep_fn-- function used to sleep between retries. Defaults toProcess.sleep/1.:reraise-- whentrue, re-raises rescued exceptions with the original stacktrace when retries are exhausted. Defaults tofalse.
Returns
{:ok, value} on success, or {:error, reason} when all retries are exhausted (or the retry is aborted). When reraise: true, rescued exceptions are re-raised instead of being returned as errors.