# `Lockstep.Cluster`
[🔗](https://github.com/b-erdem/lockstep/blob/v0.1.0/lib/lockstep/cluster.ex#L1)

Multi-node simulation under Lockstep's controller.

A "node" here is a logical grouping of managed processes inside a
single BEAM. Cross-node `Lockstep.send/2` is just a controller-routed
send that respects the active partition state.

This is a simulated cluster — there's no real BEAM distribution
(no `:net_kernel`, no TCP, no `epmd`). The trade-off vs running real
multi-node Erlang:

  * **Faster** — thousands of iterations per minute vs. minutes per
    iteration with real clustering
  * **Reproducible** — same seed produces byte-identical traces
    including network events
  * **Catches user-code distributed bugs** — CRDT non-convergence,
    cross-node monitor races, registry split-brain, leader election
    thrashing
  * **Doesn't catch BEAM-distribution-layer bugs** — `:net_kernel`,
    `:gen_tcp` issues are out of scope

## Quick example

    Lockstep.Runner.run(fn ->
      [n1, n2, n3] = Lockstep.Cluster.start_nodes([:n1, :n2, :n3])

      # Spawn a process on each node.
      for node <- [n1, n2, n3] do
        Lockstep.Cluster.spawn(node, fn ->
          # ... do per-node work ...
        end)
      end

      # Partition n1 from the rest.
      Lockstep.Cluster.partition([n1], [n2, n3])

      # ... do work that should split-brain ...

      Lockstep.Cluster.heal()

      # ... verify convergence ...
    end)

## Partition modes

  * `:drop` (default) — cross-partition messages are silently lost.
    Matches real BEAM-distribution behavior when TCP closes during
    a partition.
  * `:defer` — messages are queued and delivered when the partition
    heals, in original send order. Useful for stress-testing
    convergence-on-eventual-delivery code paths.

# `heal`

```elixir
@spec heal() :: :ok
```

Heal all active partitions. With `:defer` mode partitions, all
queued messages are delivered to their destinations in original
send order before the heal completes.

# `list`

```elixir
@spec list() :: [atom()]
```

List all registered cluster node atoms (always includes `:nonode@nohost`).

# `list_up`

```elixir
@spec list_up() :: [atom()]
```

List currently-up nodes (excludes the default :nonode@nohost).

# `monitor_node`

```elixir
@spec monitor_node(atom()) :: :ok
```

Monitor `node`. When the node stops, the calling process receives
`{:nodedown, node}`. Mirrors `Node.monitor/2`.

Repeated calls add the caller multiple times -- BEAM semantics is
duplicate sends per duplicate registration.

# `node`

```elixir
@spec node(pid()) :: atom()
```

The node `pid` belongs to. Defaults to `self()`.

# `partition`

```elixir
@spec partition([atom()], [atom()], keyword()) :: :ok
```

Add a partition between two groups of nodes. While active,
cross-group sends are dropped (`:drop`, default) or queued for
later delivery (`:defer`).

    Lockstep.Cluster.partition([:n1], [:n2, :n3])
    # ... work happens with n1 isolated ...
    Lockstep.Cluster.heal()

# `run`

```elixir
@spec run(atom(), (-&gt; any())) :: any()
```

Run `fun` on `node` and return its result. Synchronously spawns
the function on the target node and waits for it to send back its
return value.

Useful for setup or assertions:

    result = Lockstep.Cluster.run(:n1, fn ->
      Phoenix.Tracker.list(MyTracker, "topic")
    end)

    assert length(result) == 3

# `set_ticktime`

```elixir
@spec set_ticktime(non_neg_integer()) :: :ok
```

Set the virtual `net_ticktime` (default 15000ms). Determines how
long after a partition before cross-partition monitors fire
`{:DOWN, ref, :process, pid, :noconnection}`.

This is virtual time -- the wait advances Lockstep's clock when
no process is otherwise ready, not wall-clock seconds.

# `spawn`

```elixir
@spec spawn(atom(), (-&gt; any())) :: pid()
```

Spawn a process on `node`. Returns its pid. Like `Lockstep.spawn/1`
but tags the new process as belonging to `node`.

# `spawn_link`

```elixir
@spec spawn_link(atom(), (-&gt; any())) :: pid()
```

Spawn-link a process on `node`. Same as `Lockstep.spawn_link/1`
with explicit node tagging.

# `start_node`

```elixir
@spec start_node(atom()) :: :ok
```

Bring a previously-stopped node back online with fresh state.
Pids that lived on the previous incarnation are gone permanently;
new spawns on this node start from scratch.

# `start_nodes`

```elixir
@spec start_nodes([atom()]) :: [atom()]
```

Register a list of notional cluster nodes. Returns the same list
for convenience. Idempotent — re-registering an existing node is a
no-op. The default node `:nonode@nohost` is always present even if
not explicitly registered.

# `stop_node`

```elixir
@spec stop_node(atom()) :: :ok
```

Stop a node. All its processes are killed (`Process.exit/:kill`),
monitors of pids on that node fire `:DOWN` with reason
`:nodedown`, and any `monitor_node` watchers receive
`{:nodedown, node}`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
