# `Resiliency.BackoffRetry`
[🔗](https://github.com/yoavgeva/resiliency/blob/v0.6.0/lib/resiliency/backoff_retry.ex#L1)

Functional retry with backoff for Elixir.

`Resiliency.BackoffRetry` provides a simple `retry/2` function that executes a function
and retries on failure using composable, stream-based backoff strategies.
Zero macros, zero processes, injectable sleep for fast tests.

## When to use

  * Calling an external HTTP API that occasionally returns transient 5xx errors
    or connection timeouts — retry with exponential backoff to give the service
    time to recover.
  * Writing to a database that may temporarily reject connections under load —
    retry with a time budget so the caller does not block indefinitely.
  * Consuming messages from a queue where processing occasionally fails due to
    upstream flakiness — retry a bounded number of times before dead-lettering.
  * Performing DNS lookups or certificate refreshes at startup where a brief
    network blip should not crash the application.

## How it works

`retry/2` executes the given zero-arity function and inspects its return value.
Any `{:ok, _}` or bare value is treated as success and returned immediately.
Any `{:error, _}`, raised exception, caught exit, or caught throw is treated as
failure. On failure the optional `:retry_if` predicate is consulted — if it
returns `false`, the error is returned at once.

When a retry is warranted, the next delay is pulled from a pre-built list of
delay values. That list is produced by taking `max_attempts - 1` elements from
an infinite `Stream` generated by `Resiliency.BackoffRetry.Backoff` (exponential,
linear, or constant), each capped at `:max_delay`. Before sleeping, the
optional `:on_retry` callback fires, then the configured `:sleep_fn` is called
with the delay in milliseconds.

A time `:budget` may be specified. Before each sleep, the engine checks whether
the remaining budget can absorb the upcoming delay. If not, retries stop and the
last error is returned. This provides a hard ceiling on total wall-clock time
independent of the number of attempts. When `:reraise` is `true` and the
original failure was a rescued exception, the exception is re-raised with its
original stacktrace once retries are exhausted — useful for letting crash
reporters capture the real origin.

## Algorithm Complexity

| Function | Time | Space |
|---|---|---|
| `retry/2` | O(n) where n = `max_attempts` — each attempt is O(1) overhead beyond the user function | O(n) — the pre-built delay list holds at most n - 1 elements |
| `abort/1` | O(1) | O(1) |

## Quick start

    # Retry with defaults (3 attempts, exponential backoff)
    {:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end)

    # With options
    {:ok, body} = Resiliency.BackoffRetry.retry(fn -> fetch(url) end,
      backoff: :exponential,
      max_attempts: 5,
      retry_if: fn
        {:error, :timeout} -> true
        {:error, :econnrefused} -> true
        _ -> false
      end,
      on_retry: fn attempt, delay, error ->
        Logger.warning("Attempt #{attempt} failed: #{inspect(error)}")
      end
    )

## Options

  * `:backoff` — `:exponential` (default), `:linear`, `:constant`, or any `Enumerable` of ms
  * `:base_delay` — initial delay in ms (default: `100`)
  * `:max_delay` — cap per-retry delay in ms (default: `5_000`)
  * `:max_attempts` — total attempts including first (default: `3`)
  * `:budget` — total time budget in ms (default: `:infinity`)
  * `:retry_if` — `fn {:error, reason} -> boolean end` (default: retries all errors)
  * `:on_retry` — `fn attempt, delay, error -> any` callback before sleep
  * `:sleep_fn` — sleep function, defaults to `Process.sleep/1`
  * `:reraise` — `true` to re-raise rescued exceptions with original stacktrace when retries are exhausted (default: `false`)

## Telemetry

All events are emitted in the caller's process. See `Resiliency.Telemetry` for the
complete event catalogue.

### `[:resiliency, :retry, :start]`

Emitted before the first attempt.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `system_time` | `integer` | `System.system_time()` at emission time |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `max_attempts` | `integer` | Configured maximum number of attempts |

### `[:resiliency, :retry, :stop]`

Emitted after the operation completes — either success or exhausted retries (without re-raise).

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `duration` | `integer` | Elapsed native time units (`System.monotonic_time/0` delta) |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `max_attempts` | `integer` | Configured maximum number of attempts |
| `attempts` | `integer` | Actual number of attempts made |
| `result` | `:ok | :error` | `:ok` on success, `:error` on failure |

### `[:resiliency, :retry, :exception]`

Emitted instead of `:stop` when `reraise: true` and a rescued exception exhausts all retries.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `duration` | `integer` | Elapsed native time units |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `max_attempts` | `integer` | Configured maximum number of attempts |
| `attempts` | `integer` | Actual number of attempts made |
| `kind` | `:error` | Always `:error` (rescued exception) |
| `reason` | `Exception.t()` | The exception struct |
| `stacktrace` | `list` | Original exception stacktrace |

### `[:resiliency, :retry, :retry]`

Emitted before each retry sleep, after a failed attempt that will be retried.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `delay` | `integer` | Sleep duration in milliseconds before next attempt |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `attempt` | `integer` | The attempt number that just failed (1-based) |
| `error` | `term` | The error that triggered the retry (`{:error, reason}` form) |

# `option`

```elixir
@type option() ::
  {:backoff, :exponential | :linear | :constant | Enumerable.t()}
  | {:base_delay, non_neg_integer()}
  | {:max_delay, non_neg_integer()}
  | {:max_attempts, pos_integer()}
  | {:budget, :infinity | non_neg_integer()}
  | {:retry_if, (any() -&gt; boolean())}
  | {:on_retry, (pos_integer(), non_neg_integer(), any() -&gt; any()) | nil}
  | {:sleep_fn, (non_neg_integer() -&gt; any())}
  | {:reraise, boolean()}
```

# `abort`

```elixir
@spec abort(any()) :: Resiliency.BackoffRetry.Abort.t()
```

Creates an `%Abort{}` struct to signal immediate retry termination.

## Parameters

* `reason` -- any term describing why the retry should be aborted.

## Returns

A `Resiliency.BackoffRetry.Abort.t()` struct wrapping the given reason.

## Example

    Resiliency.BackoffRetry.retry(fn ->
      case api_call() do
        {:error, :not_found} -> {:error, Resiliency.BackoffRetry.abort(:not_found)}
        other -> other
      end
    end)

# `retry`

```elixir
@spec retry((-&gt; any()), [option()]) :: {:ok, any()} | {:error, any()}
```

Executes `fun` and retries on failure with configurable backoff.

See the module documentation for available options.

With `reraise: true`, re-raises rescued exceptions with the original
stacktrace when retries are exhausted instead of returning `{:error, exception}`.

## Parameters

* `fun` -- a zero-arity function to execute. Must return `{:ok, value}`, `{:error, reason}`, or a bare value (see result normalization in the module docs).
* `opts` -- keyword list of options. Defaults to `[]`.
  * `:backoff` -- backoff strategy: `:exponential`, `:linear`, `:constant`, or any `Enumerable` of ms. Defaults to `:exponential`.
  * `:base_delay` -- initial delay in milliseconds. Defaults to `100`.
  * `:max_delay` -- cap per-retry delay in milliseconds. Defaults to `5_000`.
  * `:max_attempts` -- total attempts including the first. Defaults to `3`.
  * `:budget` -- total time budget in milliseconds. Defaults to `:infinity`.
  * `:retry_if` -- `fn {:error, reason} -> boolean end` predicate controlling whether to retry. Defaults to retrying all errors.
  * `:on_retry` -- `fn attempt, delay, error -> any` callback invoked before each sleep. Defaults to `nil`.
  * `:sleep_fn` -- function used to sleep between retries. Defaults to `Process.sleep/1`.
  * `:reraise` -- when `true`, re-raises rescued exceptions with the original stacktrace when retries are exhausted. Defaults to `false`.

## Returns

`{:ok, value}` on success, or `{:error, reason}` when all retries are exhausted (or the retry is aborted). When `reraise: true`, rescued exceptions are re-raised instead of being returned as errors.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
