View Source ProcessHub.Strategy.PartitionTolerance.DynamicQuorum (ProcessHub v0.2.0-alpha)

The dynamic quorum strategy provides a dynamic way to handle network partitions in the ProcessHub cluster.

This strategy is suitable for clusters that do not know the number of nodes in the cluster beforehand.

The dynamic quorum strategy uses a cache to store the quorum log. The quorum log is a list of tuples containing the node name and the timestamp when the connection to the node was lost. This log is used to calculate the quorum size, which is measured in percentage. If the calculated quorum size is less than the quorum_size defined in the configuration, the ProcessHub will be considered to be in a network partition. In such a case, the ProcessHub will terminate its distributed supervisor process, ProcessHub.DistributedSupervisor, and all its children. Also, the local event queue will be locked by increasing the priority level.

The quorum size is calculated by the following formula:

quorum_size = (connected_nodes / (connected_nodes + down_nodes)) * 100

# The `connected_nodes` are the number of nodes that are currently connected to the `ProcessHub` cluster.
# The `down_nodes` are the number of nodes that are listed under the quorum log.

The threshold_time is used to determine how long a node should be listed in the quorum log before it is removed from the quorum log. This means if we have a threshold_time of 10 seconds, and we lose one node every 10 seconds until we have lost all nodes, the system won't be considered to be in a network partition, regardless of whether the lost nodes are lost due to network partition or not. If we lose all those nodes in less than 10 seconds, the system will be considered to be in a network partition.

Scaling down

When scaling down the cluster, make sure to scale down one node at a time and wait for the threshold_time before scaling down the next node.

Do not scale down too many nodes at once because the system may think that it is in a network partition and terminate the ProcessHub.DistributedSupervisor process and all its children.

Summary

Types

t()

Dynamic quorum strategy configuration.

Types

@type t() :: %ProcessHub.Strategy.PartitionTolerance.DynamicQuorum{
  quorum_size: non_neg_integer(),
  threshold_time: non_neg_integer()
}

Dynamic quorum strategy configuration.

  • quorum_size - The quorum size is measured in percentage. This is the required quorum size. For example, 50 means that if we have 10 nodes in the cluster, the system will be considered to be in a network partition if we lose 6 (60% of the cluster) nodes or more within the threshold_time period.
  • threshold_time - The threshold time is measured in seconds. This is how long a node should be listed in the quorum log before it is removed from the quorum log.