# `Aerospike.RetryPolicy`
[🔗](https://github.com/luisgabrielroldan/aerospike_driver/blob/v0.3.1/lib/aerospike/retry_policy.ex#L1)

Retry configuration and error classification for the command path.

The retry driver consumes a `t:t/0` value per command and decides, based on
the classification helpers below, whether to re-dispatch an attempt against
the next replica (on a rebalance-class error), re-dispatch against a fresh
pool worker (on a transport-class error), or return the error verbatim (on
anything else).

## Writer discipline

The retry policy is cluster-scoped, not per-node, and is established
once at `Aerospike.start_link/1` time. The Tender computes the
effective policy and publishes it to the `:meta` ETS table under the key
`:retry_opts`; the command path reads it lock-free via `load/1`. Runtime
mutation still stays behind the single-writer boundary that governs every
other published `:meta` entry.

Per-command overrides (`:timeout`, `:max_retries`,
`:sleep_between_retries_ms`, `:replica_policy`) may be passed through
`Aerospike.get/3`'s `opts` and are merged on top of the cluster default
by `merge/2`.

## Classification

One canonical classifier drives the retry loop and the pool-side
failure accounting. It returns:

  * `:bucket` — one of `:ok`, `:rebalance`, `:transport`,
    `:routing_refusal`, `:server_fatal`
  * `:retry_classification` — the retry telemetry label or `nil`
  * `:close_connection?` — whether the current worker should be
    discarded after the outcome
  * `:node_failure?` — whether the outcome should increment the
    node's `:failed` counter

The buckets stay disjoint:

  * **rebalance** — the server replied with a result code that says
    "this partition is not mine right now" (currently
    `:partition_unavailable`). The retry driver re-picks on a
    different replica and asynchronously asks the Tender for a fresh
    partition map.

  * **transport** — the command did not reach a server that answered
    cleanly: `:network_error`, `:timeout`, `:connection_error`
    (socket), `:pool_timeout`, `:invalid_node` (pool checkout), and
    `:circuit_open` (circuit-breaker refusal). These are not ownership signals;
    the retry driver re-dispatches without asking for a map refresh.

  * **routing_refusal** — the router refused to select a replica
    (`:cluster_not_ready`, `:no_master`). The driver returns the atom
    verbatim; no retry.

  * **server_fatal** — everything else: server logical errors
    (`:key_not_found`, `:generation_error`, …) and client-local fatal
    errors like `:parse_error`. The driver returns these verbatim.

# `bucket`

```elixir
@type bucket() :: :ok | :rebalance | :transport | :routing_refusal | :server_fatal
```

High-level outcome bucket used by retry and pool-failure logic.

Buckets are intentionally disjoint so the retry driver can decide whether
to retry, refresh cluster state, close a socket, or return the error as-is.

# `classification`

```elixir
@type classification() :: %{
  bucket: bucket(),
  retry_classification: retry_classification(),
  close_connection?: boolean(),
  node_failure?: boolean()
}
```

Complete retry classification for one command outcome.

# `option`

```elixir
@type option() ::
  {:max_retries, non_neg_integer()}
  | {:sleep_between_retries_ms, non_neg_integer()}
  | {:replica_policy, replica_policy()}
  | {atom(), term()}
```

Retry option accepted by `from_opts/1` and `merge/2`.

  * `:max_retries` — retries after the initial attempt. `0` disables
    retry.
  * `:sleep_between_retries_ms` — fixed delay between attempts.
  * `:replica_policy` — `:master` or `:sequence`.
  * any other atom key — accepted and ignored.

Unknown keys are ignored so retry options can be merged from broader
command/startup option lists.

# `options`

```elixir
@type options() :: [option()]
```

Keyword list of retry options.

# `replica_policy`

```elixir
@type replica_policy() :: :master | :sequence
```

Replica selection policy used when retrying read commands.

# `retry_classification`

```elixir
@type retry_classification() :: :rebalance | :transport | :circuit_open | nil
```

Telemetry retry label derived from a command outcome.

`nil` means the outcome is not retryable.

# `t`

```elixir
@type t() :: %{
  max_retries: non_neg_integer(),
  sleep_between_retries_ms: non_neg_integer(),
  replica_policy: replica_policy()
}
```

Effective retry policy for one command.

  * `:max_retries` — number of retries **after** the initial attempt
    (so a `:max_retries` of `2` means up to 3 attempts total). Must be
    a non-negative integer. `0` disables retry entirely.
  * `:sleep_between_retries_ms` — fixed delay between attempts; no
    jitter or exponential backoff.
  * `:replica_policy` — `:master` dispatches every attempt against the
    master replica (transport failures retry the same node); `:sequence`
    walks the replica list via `rem(attempt, length(replicas))` on each
    retry.

# `classify`

```elixir
@spec classify(term()) :: classification()
```

Classifies one command outcome into retry buckets and the
metadata the retry and pool layers consume.

# `defaults`

```elixir
@spec defaults() :: t()
```

Returns the default retry policy. Used by the Tender at init.

# `from_opts`

```elixir
@spec from_opts(options()) :: t()
```

Builds an effective retry policy by overlaying the keyword `opts` on
top of `defaults/0`.

Intended for the Tender's init path: validate the caller's start opts
once and store the resulting map in `:meta`. Unknown keys are ignored
so the retry policy can live alongside future policy knobs without a
config migration.

# `load`

```elixir
@spec load(atom()) :: t()
```

Reads the cluster-default retry policy from the `:meta` ETS table.

Falls back to `defaults/0` when the slot is absent so readers never
crash against a Tender that was started without the retry plumbing
(a cluster-state-only test harness, for example, that skips the
retry-opts init).

# `merge`

```elixir
@spec merge(t(), options()) :: t()
```

Overlays per-command `opts` on top of `base`. Only the three retry
fields are recognised; other keys are ignored.

# `node_failure?`

```elixir
@spec node_failure?(term()) :: boolean()
```

Returns `true` when `term` should increment the node's `:failed`
counter.

# `put`

```elixir
@spec put(atom(), t()) :: true
```

Writes `policy` to `meta_tab` under the ETS key used by `load/1`.

Runtime publication flows through the cluster-state writer; table creation
also uses this helper once to seed the default row before the tend-cycle
worker starts.

# `rebalance?`

```elixir
@spec rebalance?(term()) :: boolean()
```

Returns `true` when `term` is an error the retry driver should treat
as a cluster-rebalance signal. Accepts either a bare `%Aerospike.Error{}`
or the `{:error, _}` tuple form the command path produces; delegates to
the canonical classifier above.

# `retry_classification`

```elixir
@spec retry_classification(term()) :: retry_classification()
```

Returns the retry telemetry label for `term`, or `nil` when the
outcome is fatal / non-retryable.

# `transport?`

```elixir
@spec transport?(term()) :: boolean()
```

Returns `true` when `term` is an error the retry driver should treat
as a transport-class failure (re-dispatch without re-routing logic
beyond the replica walk).

Examples of transport-class codes: `:network_error`, `:timeout`,
`:connection_error`, `:pool_timeout`, `:invalid_node`, `:circuit_open`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
