Resiliency.BackoffRetry (Resiliency v0.6.0)

Copy Markdown View Source

Functional retry with backoff for Elixir.

Resiliency.BackoffRetry provides a simple retry/2 function that executes a function and retries on failure using composable, stream-based backoff strategies. Zero macros, zero processes, injectable sleep for fast tests.

When to use

  • Calling an external HTTP API that occasionally returns transient 5xx errors or connection timeouts — retry with exponential backoff to give the service time to recover.
  • Writing to a database that may temporarily reject connections under load — retry with a time budget so the caller does not block indefinitely.
  • Consuming messages from a queue where processing occasionally fails due to upstream flakiness — retry a bounded number of times before dead-lettering.
  • Performing DNS lookups or certificate refreshes at startup where a brief network blip should not crash the application.

How it works

retry/2 executes the given zero-arity function and inspects its return value. Any {:ok, _} or bare value is treated as success and returned immediately. Any {:error, _}, raised exception, caught exit, or caught throw is treated as failure. On failure the optional :retry_if predicate is consulted — if it returns false, the error is returned at once.

When a retry is warranted, the next delay is pulled from a pre-built list of delay values. That list is produced by taking max_attempts - 1 elements from an infinite Stream generated by Resiliency.BackoffRetry.Backoff (exponential, linear, or constant), each capped at :max_delay. Before sleeping, the optional :on_retry callback fires, then the configured :sleep_fn is called with the delay in milliseconds.

A time :budget may be specified. Before each sleep, the engine checks whether the remaining budget can absorb the upcoming delay. If not, retries stop and the last error is returned. This provides a hard ceiling on total wall-clock time independent of the number of attempts. When :reraise is true and the original failure was a rescued exception, the exception is re-raised with its original stacktrace once retries are exhausted — useful for letting crash reporters capture the real origin.

Algorithm Complexity

FunctionTimeSpace
retry/2O(n) where n = max_attempts — each attempt is O(1) overhead beyond the user functionO(n) — the pre-built delay list holds at most n - 1 elements
abort/1O(1)O(1)

Quick start

# Retry with defaults (3 attempts, exponential backoff)
{:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end)

# With options
{:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end,
  backoff: :exponential,
  max_attempts: 5,
  retry_if: fn
    {:error, :timeout} -> true
    {:error, :econnrefused} -> true
    _ -> false
  end,
  on_retry: fn attempt, delay, error ->
    Logger.warning("Attempt #{attempt} failed: #{inspect(error)}")
  end
)

Options

  • :backoff:exponential (default), :linear, :constant, or any Enumerable of ms
  • :base_delay — initial delay in ms (default: 100)
  • :max_delay — cap per-retry delay in ms (default: 5_000)
  • :max_attempts — total attempts including first (default: 3)
  • :budget — total time budget in ms (default: :infinity)
  • :retry_iffn {:error, reason} -> boolean end (default: retries all errors)
  • :on_retryfn attempt, delay, error -> any callback before sleep
  • :sleep_fn — sleep function, defaults to Process.sleep/1
  • :reraisetrue to re-raise rescued exceptions with original stacktrace when retries are exhausted (default: false)

Telemetry

All events are emitted in the caller's process. See Resiliency.Telemetry for the complete event catalogue.

[:resiliency, :retry, :start]

Emitted before the first attempt.

Measurements

KeyTypeDescription
system_timeintegerSystem.system_time() at emission time

Metadata

KeyTypeDescription
max_attemptsintegerConfigured maximum number of attempts

[:resiliency, :retry, :stop]

Emitted after the operation completes — either success or exhausted retries (without re-raise).

Measurements

KeyTypeDescription
durationintegerElapsed native time units (System.monotonic_time/0 delta)

Metadata

KeyTypeDescription
max_attemptsintegerConfigured maximum number of attempts
attemptsintegerActual number of attempts made

| result | :ok | :error | :ok on success, :error on failure |

[:resiliency, :retry, :exception]

Emitted instead of :stop when reraise: true and a rescued exception exhausts all retries.

Measurements

KeyTypeDescription
durationintegerElapsed native time units

Metadata

KeyTypeDescription
max_attemptsintegerConfigured maximum number of attempts
attemptsintegerActual number of attempts made
kind:errorAlways :error (rescued exception)
reasonException.t()The exception struct
stacktracelistOriginal exception stacktrace

[:resiliency, :retry, :retry]

Emitted before each retry sleep, after a failed attempt that will be retried.

Measurements

KeyTypeDescription
delayintegerSleep duration in milliseconds before next attempt

Metadata

KeyTypeDescription
attemptintegerThe attempt number that just failed (1-based)
errortermThe error that triggered the retry ({:error, reason} form)

Summary

Functions

Creates an %Abort{} struct to signal immediate retry termination.

Executes fun and retries on failure with configurable backoff.

Types

option()

@type option() ::
  {:backoff, :exponential | :linear | :constant | Enumerable.t()}
  | {:base_delay, non_neg_integer()}
  | {:max_delay, non_neg_integer()}
  | {:max_attempts, pos_integer()}
  | {:budget, :infinity | non_neg_integer()}
  | {:retry_if, (any() -> boolean())}
  | {:on_retry, (pos_integer(), non_neg_integer(), any() -> any()) | nil}
  | {:sleep_fn, (non_neg_integer() -> any())}
  | {:reraise, boolean()}

Functions

abort(reason)

@spec abort(any()) :: Resiliency.BackoffRetry.Abort.t()

Creates an %Abort{} struct to signal immediate retry termination.

Parameters

  • reason -- any term describing why the retry should be aborted.

Returns

A Resiliency.BackoffRetry.Abort.t() struct wrapping the given reason.

Example

Resiliency.BackoffRetry.retry(fn ->
  case api_call() do
    {:error, :not_found} -> {:error, Resiliency.BackoffRetry.abort(:not_found)}
    other -> other
  end
end)

retry(fun, opts \\ [])

@spec retry((-> any()), [option()]) :: {:ok, any()} | {:error, any()}

Executes fun and retries on failure with configurable backoff.

See the module documentation for available options.

With reraise: true, re-raises rescued exceptions with the original stacktrace when retries are exhausted instead of returning {:error, exception}.

Parameters

  • fun -- a zero-arity function to execute. Must return {:ok, value}, {:error, reason}, or a bare value (see result normalization in the module docs).
  • opts -- keyword list of options. Defaults to [].
    • :backoff -- backoff strategy: :exponential, :linear, :constant, or any Enumerable of ms. Defaults to :exponential.
    • :base_delay -- initial delay in milliseconds. Defaults to 100.
    • :max_delay -- cap per-retry delay in milliseconds. Defaults to 5_000.
    • :max_attempts -- total attempts including the first. Defaults to 3.
    • :budget -- total time budget in milliseconds. Defaults to :infinity.
    • :retry_if -- fn {:error, reason} -> boolean end predicate controlling whether to retry. Defaults to retrying all errors.
    • :on_retry -- fn attempt, delay, error -> any callback invoked before each sleep. Defaults to nil.
    • :sleep_fn -- function used to sleep between retries. Defaults to Process.sleep/1.
    • :reraise -- when true, re-raises rescued exceptions with the original stacktrace when retries are exhausted. Defaults to false.

Returns

{:ok, value} on success, or {:error, reason} when all retries are exhausted (or the retry is aborted). When reraise: true, rescued exceptions are re-raised instead of being returned as errors.