SuperCache.Cluster.Bootstrap (SuperCache v1.3.0)

Copy Markdown View Source

Cluster-aware startup and shutdown for SuperCache.

Start options

All options accepted by SuperCache.Bootstrap.start!/1 are valid here, plus the following cluster-specific keys:

OptionTypeDefaultDescription
:clusteratom:distributed:local or :distributed
:replication_factorinteger2Total copies (primary + replicas)
:replication_modeatom:async:async, :sync, or :strong (3PC)
:num_partitionintegerscheduler countNumber of ETS partitions
:table_typeatom:setETS table type

Node-source options (forwarded to NodeMonitor)

OptionTypeDefaultDescription
:nodes[node()]Static peer list evaluated once at start-up.
:nodes_mfa{module, atom, [term]}Called at init and on every :refresh_ms tick.
:refresh_mspos_integer5_000MFA re-evaluation interval (ignored otherwise).

When neither :nodes nor :nodes_mfa is supplied, NodeMonitor falls back to watching all Erlang-connected nodes (legacy behaviour).

Replication modes

  • :async — fire-and-forget. Lowest latency; eventual consistency.
  • :sync — synchronous delivery to all replicas before returning. One extra RTT per write.
  • :strong — three-phase commit via SuperCache.Cluster.ThreePhaseCommit. Guarantees that either all replicas apply a write or none do. Three extra RTTs per write.

Start sequence

  1. Validate options.
  2. Write all options to SuperCache.Config.
  3. Reconfigure NodeMonitor with the node-source opts (:nodes, :nodes_mfa, :refresh_ms). This is done early so Manager sees the correct managed set when it health-checks peers in later steps.
  4. If other nodes are already live, verify that every structural config key on this node matches the cluster. Raises ArgumentError on any mismatch. No ETS tables have been created at this point, so a rejection leaves the node in a completely clean state and start!/1 can be retried with corrected opts without hitting "table already exists".
  5. Start Partition and Storage subsystems.
  6. Start the Buffer write-buffer pool.
  7. If :replication_mode is :strong, run crash-recovery via ThreePhaseCommit.recover/0 to resolve in-doubt transactions left over from a previous crash.
  8. If other nodes are already live, request a full sync so this node receives a consistent snapshot of each partition.
  9. Mark :started in config.

Stop sequence

  1. Stop the Buffer (flushes pending lazy writes).
  2. Stop Storage (deletes ETS tables).
  3. Stop Partition (clears partition registry).
  4. Mark :started as false.

Config verification

When a node joins a running cluster, start!/1 calls verify_cluster_config!/1 which performs a pairwise comparison of every structural config key against all live peers via :erpc. The keys checked are:

[:key_pos, :partition_pos, :num_partition, :table_type, :replication_factor, :replication_mode]

:started, :cluster, and :table_prefix are intentionally excluded: :started is a liveness flag that will differ during bootstrap; :cluster is always :distributed in this module; :table_prefix must already match for ETS tables to be addressable so a mismatch would cause an earlier crash.

Any mismatch raises ArgumentError listing every divergent key with both the local and remote values so the operator can identify the problem immediately rather than observing silent data inconsistency later.

Summary

Functions

Return the structural config of this node as a map.

Return the full partition map for this node as a list of {partition_idx, {primary, replicas}} pairs.

Returns true when this node is running in distributed mode and has completed start-up.

Stop SuperCache and release all ETS resources.

Functions

export_config()

@spec export_config() :: map()

Return the structural config of this node as a map.

Called via :erpc from a joining node during config verification. Returns only the keys in @config_keys — never liveness flags.

Example

SuperCache.Cluster.Bootstrap.export_config()
# => %{key_pos: 0, partition_pos: 0, num_partition: 8,
#       table_type: :set, replication_factor: 2, replication_mode: :async}

fetch_partition_map(num)

@spec fetch_partition_map(pos_integer()) :: [{non_neg_integer(), {node(), [node()]}}]

Return the full partition map for this node as a list of {partition_idx, {primary, replicas}} pairs.

Called via :erpc from test helpers on the test node to read the partition assignment of a remote peer without crossing the no-lambda boundary. num must match SuperCache.Config.get_config(:num_partition) on the calling node; callers should read that value locally and pass it as an argument so the comparison is always against the same reference.

Example

SuperCache.Cluster.Bootstrap.fetch_partition_map(8)
# => [{0, {:"a@host", [:"b@host"]}}, ...]

running?()

@spec running?() :: boolean()

Returns true when this node is running in distributed mode and has completed start-up.

Called remotely by Manager.node_running?/1 via :erpc.

start!(opts \\ [key_pos: 0, partition_pos: 0, cluster: :distributed, replication_factor: 2, replication_mode: :async, table_type: :set])

@spec start!(keyword()) :: :ok

Start SuperCache in cluster mode with the given options.

Raises ArgumentError for invalid options or when the node's structural config does not match an already-running cluster.

stop()

@spec stop() :: :ok

Stop SuperCache and release all ETS resources.