View Source ProcessHub.Strategy.PartitionTolerance.DynamicQuorum (ProcessHub v0.2.0-alpha)
The dynamic quorum strategy provides a dynamic way to handle network partitions in the ProcessHub
cluster.
This strategy is suitable for clusters that do not know the number of nodes in the cluster beforehand.
The dynamic quorum strategy uses a cache to store the quorum log. The quorum log is a list of tuples
containing the node name and the timestamp when the connection to the node was lost.
This log is used to calculate the quorum size, which is measured in percentage.
If the calculated quorum size is less than the quorum_size
defined in the configuration,
the ProcessHub
will be considered to be in a network partition.
In such a case, the ProcessHub
will terminate its distributed supervisor process,
ProcessHub.DistributedSupervisor
, and all its children. Also, the local event queue will be
locked by increasing the priority level.
The quorum size is calculated by the following formula:
quorum_size = (connected_nodes / (connected_nodes + down_nodes)) * 100
# The `connected_nodes` are the number of nodes that are currently connected to the `ProcessHub` cluster.
# The `down_nodes` are the number of nodes that are listed under the quorum log.
The threshold_time
is used to determine how long a node should be listed in the quorum log
before it is removed from the quorum log. This means if we have a threshold_time
of 10 seconds,
and we lose one node every 10 seconds until we have lost all nodes, the system won't be
considered to be in a network partition, regardless of whether the lost nodes are lost due to network
partition or not. If we lose all those nodes in less than 10 seconds, the system will be considered
to be in a network partition.
Scaling down
When scaling down the cluster, make sure to scale down one node at a time and wait for the
threshold_time
before scaling down the next node.Do not scale down too many nodes at once because the system may think that it is in a network partition and terminate the
ProcessHub.DistributedSupervisor
process and all its children.
Summary
Types
@type t() :: %ProcessHub.Strategy.PartitionTolerance.DynamicQuorum{ quorum_size: non_neg_integer(), threshold_time: non_neg_integer() }
Dynamic quorum strategy configuration.
quorum_size
- The quorum size is measured in percentage. This is the required quorum size. For example,50
means that if we have 10 nodes in the cluster, the system will be considered to be in a network partition if we lose 6 (60% of the cluster) nodes or more within thethreshold_time
period.threshold_time
- The threshold time is measured in seconds. This is how long a node should be listed in the quorum log before it is removed from the quorum log.