# `SuperCache.Cluster.Bootstrap`
[🔗](https://github.com/ohhi-vn/super_cache/blob/main/lib/cluster/cluster_bootstrap.ex#L1)

Cluster-aware startup and shutdown for SuperCache.

## Start options

All options accepted by `SuperCache.Bootstrap.start!/1` are valid here,
plus the following cluster-specific keys:

| Option               | Type                     | Default        | Description                              |
|----------------------|--------------------------|----------------|------------------------------------------|
| `:cluster`           | atom                     | `:distributed` | `:local` or `:distributed`               |
| `:replication_factor`| integer                  | `2`            | Total copies (primary + replicas)        |
| `:replication_mode`  | atom                     | `:async`       | `:async`, `:sync`, or `:strong` (3PC)    |
| `:num_partition`     | integer                  | scheduler count| Number of ETS partitions                 |
| `:table_type`        | atom                     | `:set`         | ETS table type                           |

## Node-source options (forwarded to `NodeMonitor`)

| Option         | Type                     | Default | Description                                    |
|----------------|--------------------------|---------|------------------------------------------------|
| `:nodes`       | `[node()]`               | —       | Static peer list evaluated once at start-up.   |
| `:nodes_mfa`   | `{module, atom, [term]}` | —       | Called at init and on every `:refresh_ms` tick.|
| `:refresh_ms`  | pos_integer              | `5_000` | MFA re-evaluation interval (ignored otherwise).|

When neither `:nodes` nor `:nodes_mfa` is supplied, `NodeMonitor` falls
back to watching **all** Erlang-connected nodes (legacy behaviour).

## Replication modes

- **`:async`** — fire-and-forget. Lowest latency; eventual consistency.
- **`:sync`**  — synchronous delivery to all replicas before returning.
  One extra RTT per write.
- **`:strong`** — three-phase commit via
  `SuperCache.Cluster.ThreePhaseCommit`.  Guarantees that either all
  replicas apply a write or none do.  Three extra RTTs per write.

## Start sequence

1. Validate options.
2. Write all options to `SuperCache.Config`.
3. Reconfigure `NodeMonitor` with the node-source opts (`:nodes`,
   `:nodes_mfa`, `:refresh_ms`).  This is done early so `Manager` sees
   the correct managed set when it health-checks peers in later steps.
4. If other nodes are already live, verify that every structural config
   key on this node matches the cluster.  Raises `ArgumentError` on any
   mismatch.  **No ETS tables have been created at this point**, so a
   rejection leaves the node in a completely clean state and `start!/1`
   can be retried with corrected opts without hitting "table already exists".
5. Start `Partition` and `Storage` subsystems.
6. Start the `Buffer` write-buffer pool.
7. If `:replication_mode` is `:strong`, run crash-recovery via
   `ThreePhaseCommit.recover/0` to resolve in-doubt transactions left
   over from a previous crash.
8. If other nodes are already live, request a full sync so this node
   receives a consistent snapshot of each partition.
9. Mark `:started` in config.

## Stop sequence

1. Stop the `Buffer` (flushes pending lazy writes).
2. Stop `Storage` (deletes ETS tables).
3. Stop `Partition` (clears partition registry).
4. Mark `:started` as `false`.

## Config verification

When a node joins a running cluster, `start!/1` calls `verify_cluster_config!/1`
which performs a pairwise comparison of every structural config key against all
live peers via `:erpc`.  The keys checked are:

`[:key_pos, :partition_pos, :num_partition, :table_type, :replication_factor, :replication_mode]`

`:started`, `:cluster`, and `:table_prefix` are intentionally excluded:
`:started` is a liveness flag that will differ during bootstrap; `:cluster`
is always `:distributed` in this module; `:table_prefix` must already match
for ETS tables to be addressable so a mismatch would cause an earlier crash.

Any mismatch raises `ArgumentError` listing every divergent key with both
the local and remote values so the operator can identify the problem
immediately rather than observing silent data inconsistency later.

# `export_config`

```elixir
@spec export_config() :: map()
```

Return the structural config of this node as a map.

Called via `:erpc` from a joining node during config verification.
Returns only the keys in `@config_keys` — never liveness flags.

## Example

    SuperCache.Cluster.Bootstrap.export_config()
    # => %{key_pos: 0, partition_pos: 0, num_partition: 8,
    #       table_type: :set, replication_factor: 2, replication_mode: :async}

# `fetch_partition_map`

```elixir
@spec fetch_partition_map(pos_integer()) :: [{non_neg_integer(), {node(), [node()]}}]
```

Return the full partition map for this node as a list of
`{partition_idx, {primary, replicas}}` pairs.

Called via `:erpc` from test helpers on the test node to read the
partition assignment of a remote peer without crossing the no-lambda
boundary.  `num` must match `SuperCache.Config.get_config(:num_partition)`
on the calling node; callers should read that value locally and pass it
as an argument so the comparison is always against the same reference.

## Example

    SuperCache.Cluster.Bootstrap.fetch_partition_map(8)
    # => [{0, {:"a@host", [:"b@host"]}}, ...]

# `running?`

```elixir
@spec running?() :: boolean()
```

Returns `true` when this node is running in distributed mode and has
completed start-up.

Called remotely by `Manager.node_running?/1` via `:erpc`.

# `start!`

```elixir
@spec start!(keyword()) :: :ok
```

Start SuperCache in cluster mode with the given options.

Raises `ArgumentError` for invalid options or when the node's structural
config does not match an already-running cluster.

# `stop`

```elixir
@spec stop() :: :ok
```

Stop SuperCache and release all ETS resources.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
