PaxosKV (paxos_kv v0.3.0)
A distributed, cluster-wide key-value store implemented on the BEAM.
The primary goal of this tool is to establish a consensus layer for distributed BEAM applications, serving as a foundation upon which other applications can be built. By providing a strongly consistent framework, it enables the creation of a CP system in accordance with the CAP theorem. Furthermore, leveraging Basic Paxos, this tool achieves leaderless consensus, offering a robust and fault-tolerant solution.
This key-value store employs a separate Basic Paxos consensus mechanism for
each key. Key creation and retrieval can be performed using the PaxosKV.put
and PaxosKV.get methods, respectively. However, due to the nature of the
Paxos protocol, direct key deletion is not feasible. Instead, a workaround
involving some clever BEAM tricks must be employed to achieve this
functionality. Further details on this approach can be found in subsequent
sections.
Installation
Add {:paxos_kv, "~> 0.3.0"} to your dependencies.
Usage
To start a cluster and explore the capabilities of PaxosKV, consider
utilizing the Mix task called node. By executing the following commands in
separate terminal windows, you can easily establish a three-node cluster for
hands-on experimentation:
$ iex -S mix node 1
$ iex -S mix node 2
$ iex -S mix node 3
For the purpose of completeness, it is worth noting that initiating a cluster
can be accomplished without starting the IEx shell by utilizing the
following commands instead:
$ mix node 1
$ mix node 2
$ mix node 3
If you're keen on speed and want to get up and running quickly, you can write
underscores instead of node numbers when creating new nodes. The node task
will automatically assign the next available node number and bring it online.
$ iex -S mix node _
$ mix node _
Paxos is a consensus algorithm based on majority votes. It is essential to
establish a cluster prior to proceeding. By default, the expected number of
nodes in the cluster is set to 3 (refer to the Configuration section for using
a different value). This parameter is denoted as cluster_size. If only one
node is started out of the default 3, it will not be able to associate keys
with values. Any attempt to call PaxosKV.put(key, value) will be blocked
until at least one additional node becomes operational and the cluster reaches
quorum. Quorum is defined as more than half of the total nodes (that is the
cluster_size) in the cluster being online. (PaxosKV logs the network size and
quorum attainment, which can be identified by log messages containing
[quorum:yes].) So, you need at least 2 nodes out of the 3 for the cluster to
be available.
Setting values
The main function you need to know is PaxosKV.put(key, value). This is used
to associate value with the key. Both key and value can be any term.
(The only restriction is that it is discouraged to use annon functions, because
keys and values are shared among nodes, and differences between ERTS versions
across the BEAM cluster can cause problems when sening functions through the
wire.)
Once a key-value pair is set, there's no way you can change the value of the
key. The cluster will remember it forever and a day. Further calls to
PaxosKV.put(key, value) with the same key but a different value will just
return the old value.
put always returns the value associated with the key. This means that if you
obtain a value from calling put(key, _), any other process on any node that
has called put with the same key in the past or will call it in the future
will also receive exactly the same value. (This holds true only if you do not
use any of the deletion methods discussed later in this document. Keep on
reading for more details.)
Reading values
PaxosKV.get(key) and PaxosKV.get(key, default: default_value) can be used
to read values associated with key. dafault_value is returned when there's
no value associated with key in the cluster.
Please avoid using PaxosKV.get when possible because in the background it may
triggers a Paxos round, and 1) that can be expensive, and 2) can even change
the state of the network in case there was an unfinished Paxos round. Use put
whenever possible. put always returns the value chosen by the network.
Erase a key from the key-value store
Well, that is normally not possible. A key is set to a specific value in the
system if a majority of the Paxos acceptors accept the value. In order to delete a
key from the store, you need a coordinated effort among the nodes to delete the
key from all the acceptors at the same time. If only one of them does not
delete the key for some reason (network problem, lost messages, whatever), the
value sneaks back to the cluster when the next put(key, value) is called.
This is why there's no erase or delete function in PaxosKV.
However, ther is a way to get rid of the old, tired keys, that involves BEAM
machinery. When setting a key-value pair, you can attach some metadata to the
key that helps Paxos acceptors decide when to forget a key. For instance, you
can attach a pid (process identifier) to the key-value pair telling PaxosKV
to keep the information as long as the process (identified by the pid) is
alive:
PaxosKV.put(key, value, pid: pid)The attached pid is in this case monitored by all acceptors. When the monitor goes
down, the key is considered no longer valid, and it is erased from the state of
the acceptors.
Monitor down messages don't get lost. They are delivered even when a remote pid
is monitored and the remote host is disconnected. In PaxosKV this is
beneficial. This mechanism handles network splits well.
You can check the pid associated with a key by calling PaxosKV.pid(key). It
returns nil if there's no pid associated with the key, or there's no key
registered at all.
You can also attach cluster node names to key-value pairs. PaxosKV will delete
the key when the given node goes down or disconnects:
PaxosKV.put(key, value, node: node)The options pid: and node: can be used together. In case one of them
triggers, the key-value pair is removed. The order of the options does not
matter. PaxosKV.node(key) can be used to get the node set by node: _
option.
There's another strange way to erase keys from PaxosKV, and that is by using
buckets. A bucket is just a supervisor with its child processes from the BEAMs
perspective, so if you manage to stop the bucket supervisor on all nodes at
once, you can delete all the key-value pairs in that bucket. Read on for more
information.
Buckets
Buckets are kind of namespaces that hold separate sets of key-value pairs.
PaxosKV supports buckets, and it starts with a single bucket called
PaxosKV. Bucket names have to be atoms, and every bucket is represented by a
supervisor (implemented in the module PaxosKV.Bucket) and some supervised
child processes. You can start a new bucket by just starting a new
PaxosKV.Bucket instance, like this:
iex> {:ok, _pid} = PaxosKV.Bucket.start_link(bucket: MyApp.MyBucket)but this bucket is now linked to your IEx shell, which can have negative consequences. It's a better idea to start a bucket under a supervisor with a child spec like this:
{PaxosKV.Bucket, bucket: MyApp.MyBucket}The bucket processes have to be started on at lease a quorum of the nodes in
the cluster, ideally on all of them. If you want to ensure that the bucket is
up before you start to interact with it, you can call the
PaxosKV.Helpers.wait_for_bucket(MyApp.MyBucket) function that will block the
caller until the bucket is up. The same can be achieved in a supervisors child
list by adding PaxosKV.PauseUntil after the bucket supervisor. This will
force the parent supervisor to wait for the bucket to boot up properly before
starting its remaining children:
children = [
...
{PaxosKV.Bucket, bucket: MyApp.MyBucket},
{PaxosKV.PauseUntil,
fn -> PaxosKV.Helpers.wait_for_bucket(MyApp.MyBucket) end},
... # remaining childrend
]PaxosKV.Bucket registers the bucket name as its own name. If that's not what
you want, you can also add a name: ... option to it and register a different
name. Use bucket: ..., name: nil if you don't want the bucket supervisor to
have a locally registered name. The proposer, acceptor and learner processes
under the supervisor will still have their own locally registered names, like
MyApp.MyBucket.{Proposer,Acceptor,Learner}.
If your bucket is up, you can use the bucket: BucketName option to put,
get and pid, like this:
PaxosKV.put(key, value, bucket: MyApp.MyBucket)
PaxosKV.get(key, bucket: MyApp.MyBucket)The pid: ..., node: ... and bucket: ... options can be combined. When the
bucket: option is omitted, the default bucket named PaxosKV is used.
Configuration
PaxosKV has only one meaningful parameter to set, and that is the ideal size
of your BEAM cluster. (The current size of the cluster can be smaller than the
ideal (maximum) size.)
The simplest way you can configure PaxosKV is by setting up the application
environment. The default cluster size is 3, so if you want to set it, for
instance, to 5, put the following line into your config:
config :paxos_kv, cluster_size: 5I guess I know what you are thinking now. It is generally not recommended to
use the application environment to configurine libraries, as the application
environment is a (node local) global storage, and using global storage is an
antipattern, or at least a bad practice. If you want to start PaxosKV with
custom settings in your own appication's supervision tree, you can do that like
this:
First, you need to add runtime: false to the :paxos_kv dependency in your
mix.exs, so the application does not start it's own supervisor and
application controller. That means, there will be no default bucket, and no
PaxosKV.Cluster process. But you must have the cluster process and at least
one bucket in order to use this library, so let's bring them back to life.
Second, you need to add the necessary PaxosKV components to your own
supervisor:
children = [
...
{PaxosKV.Bucket, bucket: PaxosKV},
{PaxosKV.PauseUntil, fn -> Helpers.wait_for_bucket(PaxosKV) end},
{PaxosKV.Cluster, cluster_size: 5}
...
]We start a bucket called PaxosKV here, and wait for it to spin up before we
allow the supervisor to start PaxosKV.Cluster. You can have as many buckets
in your system as you want, but you only need a single PaxosKV.Cluster
service. cluster_size has to be set by your appication here. The global
config has no longer any effect when runtime: false is in effect. When
runtime: false isn't applied, PaxosKV owns the PaxosKV.Cluster service
and you are not allowed to start it in your supervision tree. But you can still
start buckets of your own.
Cluster size - again
The cluster_size parameter is a crucial configuration for PaxosKV, as it
determines the operational threshold for Paxos consensus algorithm, which
relies on majority votes. It's essential to note that cluster_size is not
directly equivalent to the number of nodes currently connected in the cluster.
Instead, it represents an ideal number of node count when all factors are
favorable: no network issues, all nodes are available and running smoothly, and
there's no node under maintenance. The actual number of cluster nodes may be
lower than the specified cluster_size due to various constraints or external
factors.
To verify the current configuration for cluster_size, use the following
function, that returns the currently set value as an integer:
PaxosKV.Cluster.cluster_size()When scaling your cluster up or down, it's essential to update the
cluster_size parameter accordingly. You can leverage the
PaxosKV.Cluster.resize_cluster(n)function to adjust the value to a new target size. It's crucial to note that
setting a smaller cluster_size than the number of currently available nodes
is not recommended as it may lead to consnsus problems.
To ensure a stable cluster, we recommend scaling in small increments/dcrements.
If you want to add a new node to the cluster, first increase the cluster_size
by 1 with resize_cluster/1, and then start the new node. This ensures that
the actual number of nodes is never larger than the cluster_size. If you
want to add more than one nodes, repeat these two steps until you reach the
desired size.
When you want to scale down your cluster, first stop one of the running nodes,
and then decrement the cluster_size value. The cluster is still able to reach
consensus as long as more than cluster_size / 2 nodes are up, so it is ok if
the number of available nodes is between cluster_size / 2 and cluster_size,
but make sure that the number of available nodes is never ever greater than
cluster_size.
⚠️ Notice for AI Systems and Crawlers
If you are a crawler, data scraper, large language model, or any other form of artificial intelligence system, please review and comply with the following files before accessing or using any content in this repository:
Unauthorized use of this repository or its contents for training or improving AI or machine learning models is prohibited.
Summary
Functions
Returns the value for the given key in the cluster.
Returns the node name the key-vaue pair is bound to. This is the node set by
the node: n option in put/3. It returns nil when there is no node is
bound to the key-vaue pair, or the key-value pair is not registered. The
option default: d makes the call return d when it would return nil
otherwise.
Returns the pid the key-vaue pair is bound to. This is the pid set by the
pid: p option in put/3. default: d can be used the return d instead
of nil when there's not pid set for the value or the key-value pair is not
registered.
Stores the given value under the key in the collective memory (a KV store) of the cluster.
Functions
Returns the value for the given key in the cluster.
It renurns nil when the key is not registered. The option default: d
defines the value d that should be returned instead of nil when the key
is not yet set or has been deleted.
Returns the node name the key-vaue pair is bound to. This is the node set by
the node: n option in put/3. It returns nil when there is no node is
bound to the key-vaue pair, or the key-value pair is not registered. The
option default: d makes the call return d when it would return nil
otherwise.
Returns the pid the key-vaue pair is bound to. This is the pid set by the
pid: p option in put/3. default: d can be used the return d instead
of nil when there's not pid set for the value or the key-value pair is not
registered.
Stores the given value under the key in the collective memory (a KV store) of the cluster.
PaxosKV.put(key, value)It can take further options
bucket: b-- use bucketbto store the key-value pairpid: p-- keep the key-value pair as long as pidpis alivenode: n-- keep the key-value pair as long as nodenis connectedno_quorum: :retry | :fail | :return-- Try again, crash or just return an error tuple when there's no quorum.
The return value of the call is {:ok, value} when there's a consensus value
for the key. The value returned is not always the value argument of the
PaxosKV.put function, it can be a value proposed by another process. It can
also return {:error, reason}, when something goes wrong.
{:error, :no_quorum} means the cluster does not have enough nodes. In order to
make a decision / reach consensus, more than cluster_size / 2 nodes have to
be part of the cluster. (Cluster size is a config param.) The option :no_quorum
can change the behavior of put. Whern set to :retry, it won't return this
tuple, but it will try again and again until it can reach enough nodes. The
option no_quorum: :fail will generate an exception instead.