distribute/cluster

Types

pub type ClusterHealth {
  ClusterHealth(
    self_node: String,
    is_distributed: Bool,
    connected_nodes: List(String),
    connected_count: Int,
    reachable_nodes: List(String),
    unreachable_nodes: List(String),
  )
}

Constructors

  • ClusterHealth(
      self_node: String,
      is_distributed: Bool,
      connected_nodes: List(String),
      connected_count: Int,
      reachable_nodes: List(String),
      unreachable_nodes: List(String),
    )

Errors from connect/1.

net_kernel:connect_node/1 returns only true | false | ignored. It cannot distinguish “node does not exist” from “unreachable”. A NodeNotFound variant would be a lie at this layer, so it is not exposed; both cases collapse to ConnectFailed.

pub type ConnectError {
  ConnectFailed
  ConnectIgnored
  InvalidNodeFormat(String)
  ConnectAtomBudgetExceeded
}

Constructors

  • ConnectFailed

    The peer was reachable in principle (distribution is up) but refused or did not answer. Returned by net_kernel:connect_node/1 = false.

  • ConnectIgnored

    The local node is not running distribution (net_kernel not started), so connect_node declined to even try. Returned by connect_node = ignored.

  • InvalidNodeFormat(String)

    The supplied name failed format validation (missing @, disallowed charset, length). Carries a human-readable reason.

  • ConnectAtomBudgetExceeded

    Connecting would create a fresh node atom but the configured max_distribution_atoms budget is exhausted. Refused before touching binary_to_atom. The VM atom table stays safe.

pub type StartError {
  InvalidNodeName(String)
  InvalidCookieFormat(String)
  AlreadyStarted
  NetworkError(String)
  StartFailed(String)
  StartAtomBudgetExceeded
}

Constructors

  • InvalidNodeName(String)

    Node name failed format validation: must be <name>@<host> with charset [a-zA-Z0-9_-]+@[a-zA-Z0-9._-]+ and 1..255 bytes.

  • InvalidCookieFormat(String)

    Cookie failed format validation: charset [a-zA-Z0-9_-]+, 1..255 bytes.

  • AlreadyStarted

    net_kernel:start/1 reported the node was already running.

  • NetworkError(String)

    net_kernel:start/1 failed with a network-related reason (network, eaddrinuse, econnrefused).

  • StartFailed(String)

    net_kernel:start/1 failed with another reason.

  • StartAtomBudgetExceeded

    The configured max_distribution_atoms budget has been exhausted. Creating the node-name or cookie atom would exceed the cap. Either raise max_distribution_atoms or stop accepting fresh node names from the upstream caller.

Values

pub fn connect(node: String) -> Result(Nil, ConnectError)

Connect to a remote node. Returns Ok(Nil) on success.

Atom-table guardrail

Each call with a previously-unseen node name interns one atom in the BEAM atom table (atoms are never garbage collected, and the table is capped at 1 048 576 entries by default). To prevent a caller, malicious or buggy, from exhausting the table by looping over millions of valid-looking names, every fresh atom creation is counted against config.max_distribution_atoms.

The check is atomic (atomics:add_get/3) and lock-free. Once the budget is reached, this function returns Error(ConnectAtomBudgetExceeded) before binary_to_atom/2 is called: the VM atom table cannot be exhausted through this path.

Default budget: 10 000 fresh atoms over the process lifetime. 10x a generous cluster size, four orders of magnitude below the VM cap. Tune via config.configure(... max_distribution_atoms:).

pub fn connect_error_to_string(err: ConnectError) -> String
pub fn connected_count() -> Int

Number of currently connected nodes.

pub fn has_peers() -> Bool

Whether this node has at least one connected peer.

This is a topology check, not a health check: a single-node deployment is operationally fine and will return False here.

pub fn health() -> ClusterHealth

Perform a cluster health check, pinging each known node in parallel.

See also: has_peers/0 (boolean topology shortcut), is_healthy/0 (compatibility alias), is_distributed/0, ping/1 (single node).

net_adm:ping/1 is a synchronous network call with an implicit BEAM distribution timeout of several seconds. Pinging N nodes sequentially would block the caller for up to N * timeout_per_ping (e.g. 50 nodes during a partition = ~6 minutes). We fan out with bounded parallelism and collect results with a single 8 s deadline. Worst-case wall clock is still bounded by the deadline, not by cluster size.

Output ordering is deterministic: reachable_nodes and unreachable_nodes are projected in the same order as connected_nodes, regardless of worker reply timing.

pub fn is_distributed() -> Bool

Whether this node is running BEAM distribution.

Backed by erlang:is_alive/0, which is the authoritative signal. It returns true iff net_kernel has been started. Previous versions compared the string form of the node name against "nonode@nohost", which would lie if the runtime ever changed that placeholder.

pub fn is_healthy() -> Bool

Deprecated alias for has_peers/0, kept for compatibility with direct distribute/cluster imports from pre-facade code.

pub fn nodes() -> List(String)

List all currently connected nodes.

pub fn ping(node: String) -> Bool

Ping a remote node. Returns True if it responds.

Subject to the same config.max_distribution_atoms guardrail as connect: once the fresh-atom budget is exhausted, ping returns False (cannot reach) without touching the VM atom table.

pub fn self_node() -> String

Get the current node’s name.

pub fn start_error_to_string(err: StartError) -> String
pub fn start_monitor() -> Result(
  process.Subject(cluster_monitor.Message),
  actor.StartError,
)

Start the cluster monitor actor. It listens for Erlang node events and broadcasts them to all Gleam subscribers.

pub fn start_node(
  name: String,
  cookie: String,
) -> Result(Nil, StartError)

Start a distributed BEAM node.

name must contain @ (e.g. "myapp@127.0.0.1"). Cookie length and charset are enforced byte-wise by the FFI: any failure surfaces as InvalidCookieFormat.

Atom-budget exhaustion: the FFI emits AtomBudgetExhausted(<offending input>, AtomBudgetOnStartNode) before returning, with the actual offending input (name or cookie). We do not re-emit here because the public unit constructor StartAtomBudgetExceeded cannot carry that attribution.

Blocking and OS-level dependencies

This call can block. It delegates to net_kernel:start/1, which talks to epmd (Erlang Port Mapper Daemon) and resolves the host portion of name against the OS resolver. If epmd is not running, if DNS is misconfigured, or if the network goes down a moment before the call, the BEAM may hang on a libc resolver timeout for tens of seconds and there is no Gleam-side timeout the library can interpose.

Mitigations callers can apply:

  • Run epmd -daemon before the process boots, and treat its absence as a fatal startup condition rather than something start_node should recover from.
  • Use IP literals (myapp@127.0.0.1) when the deployment allows, bypassing DNS entirely.
  • In container deployments, ensure /etc/hosts resolves the chosen host before start_node is called.

If you cannot accept a potentially long boot wait, supervise the boot itself: spawn a process that calls start_node, monitor it, and treat a deadline miss as a startup failure. The library does not bake a timeout in because the right value is deployment-specific (a 2 s timeout is generous for IP literals but a hair-trigger for DNS-backed names).

pub fn subscribe(
  monitor: process.Subject(cluster_monitor.Message),
  listener: process.Subject(cluster_monitor.ClusterEvent),
) -> Nil

Subscribe a subject to cluster events (NodeUp/NodeDown).

pub fn unsubscribe(
  monitor: process.Subject(cluster_monitor.Message),
  listener: process.Subject(cluster_monitor.ClusterEvent),
) -> Nil

Unsubscribe from cluster events.

Search Document