Nebulex.Adapters.Replicated (Nebulex v2.2.0) View Source

Built-in adapter for replicated cache topology.

Overall features

  • Replicated cache topology.
  • Configurable primary storage adapter.
  • Cache-level locking when deleting all entries or adding new nodes.
  • Key-level (or entry-level) locking for key-based write-like operations.
  • Support for transactions via Erlang global name registration facility.
  • Stats support rely on the primary storage adapter.

Replicated Cache Topology

A replicated cache is a clustered, fault tolerant cache where data is fully replicated to every member in the cluster. This cache offers the fastest read performance with linear performance scalability for reads but poor scalability for writes (as writes must be processed by every member in the cluster). Because data is replicated to all servers, adding servers does not increase aggregate cache capacity.

There are several challenges to building a reliably replicated cache. The first is how to get it to scale and perform well. Updates to the cache have to be sent to all cluster nodes, and all cluster nodes have to end up with the same data, even if multiple updates to the same piece of data occur at the same time. Also, if a cluster node requests a lock, ideally it should not have to get all cluster nodes to agree on the lock or at least do it in a very efficient way (:global is used here), otherwise it will scale extremely poorly; yet in the case of a cluster node failure, all of the data and lock information must be kept safely.

The best part of a replicated cache is its access speed. Since the data is replicated to each cluster node, it is available for use without any waiting. This is referred to as "zero latency access," and is perfect for situations in which an application requires the highest possible speed in its data access.

However, there are some limitations:

  • Cost Per Update - Updating a replicated cache requires pushing the new version of the data to all other cluster members, which will limit scalability if there is a high frequency of updates per member.

  • Cost Per Entry - The data is replicated to every cluster member, so Memory Heap space is used on each member, which will impact performance for large caches.

Based on "Distributed Caching Essential Lessons" by Cameron Purdy.

Usage

When used, the Cache expects the :otp_app and :adapter as options. The :otp_app should point to an OTP application that has the cache configuration. For example:

defmodule MyApp.ReplicatedCache do
  use Nebulex.Cache,
    otp_app: :my_app,
    adapter: Nebulex.Adapters.Replicated
end

Optionally, you can configure the desired primary storage adapter with the option :primary_storage_adapter; defaults to Nebulex.Adapters.Local.

defmodule MyApp.ReplicatedCache do
  use Nebulex.Cache,
    otp_app: :my_app,
    adapter: Nebulex.Adapters.Replicated,
    primary_storage_adapter: Nebulex.Adapters.Local
end

The configuration for the cache must be in your application environment, usually defined in your config/config.exs:

config :my_app, MyApp.ReplicatedCache,
  primary: [
    gc_interval: 3_600_000,
    backend: :shards
  ]

If your application was generated with a supervisor (by passing --sup to mix new) you will have a lib/my_app/application.ex file containing the application start callback that defines and starts your supervisor. You just need to edit the start/2 function to start the cache as a supervisor on your application's supervisor:

def start(_type, _args) do
  children = [
    {MyApp.ReplicatedCache, []},
    ...
  ]

See Nebulex.Cache for more information.

Options

This adapter supports the following options and all of them can be given via the cache configuration:

  • :primary - The options that will be passed to the adapter associated with the local primary storage. These options will depend on the local adapter to use.

  • :task_supervisor_opts - Start-time options passed to Task.Supervisor.start_link/1 when the adapter is initialized.

Shared options

Almost all of the cache functions outlined in Nebulex.Cache module accept the following options:

  • :timeout - The time-out value in milliseconds for the command that will be executed. If the timeout is exceeded, then the current process will exit. For executing a command on remote nodes, this adapter uses Task.await/2 internally for receiving the result, so this option tells how much time the adapter should wait for it. If the timeout is exceeded, the task is shut down but the current process doesn't exit, only the result associated with that task is skipped in the reduce phase.

Telemetry events

This adapter emits all recommended Telemetry events, and documented in Nebulex.Cache module (see "Adapter-specific events" section).

Since the replicated adapter depends on the configured primary storage adapter (local cache adapter), this one may also emit Telemetry events. Therefore, there will be events emitted by the replicated adapter as well as the primary storage adapter. For example, for the cache defined before MyApp.ReplicatedCache, these would be the emitted events:

  • [:my_app, :replicated_cache, :command, :start]
  • [:my_app, :replicated_cache, :primary, :command, :start]
  • [:my_app, :replicated_cache, :command, :stop]
  • [:my_app, :replicated_cache, :primary, :command, :stop]
  • [:my_app, :replicated_cache, :command, :exception]
  • [:my_app, :replicated_cache, :primary, :command, :exception]

As you may notice, the telemetry prefix by default for the replicated cache is [:my_app, :replicated_cache], and the prefix for its primary storage [:my_app, :replicated_cache, :primary].

See also the Telemetry guide for more information and examples.

Stats

This adapter depends on the primary storage adapter for the stats support. Therefore, it is important to ensure the underlying primary storage adapter does support stats, otherwise, you may get unexpected errors.

Extended API

This adapter provides some additional convenience functions to the Nebulex.Cache API.

Retrieving the primary storage or local cache module:

MyCache.__primary__()

Retrieving the cluster nodes associated with the given cache name:

MyCache.nodes()

Joining the cache to the cluster:

MyCache.join_cluster()

Leaving the cluster (removes the cache from the cluster):

MyCache.leave_cluster()

Adapter-specific telemetry events

This adapter exposes following Telemetry events:

  • telemetry_prefix ++ [:replication] - Dispatched by the adapter when a replication error occurs due to a write-like operation under-the-hood.

    • Measurements: %{rpc_errors: non_neg_integer}
    • Metadata:
      %{
        adapter_meta: %{optional(atom) => term},
        rpc_errors: [{node, error :: term}]
      }
  • telemetry_prefix ++ [:bootstrap] - Dispatched by the adapter at start time when there are errors while synching up with the cluster nodes.

    • Measurements:
      %{
        failed_nodes: non_neg_integer,
        remote_errors: non_neg_integer
      }
    • Metadata:
      %{
        adapter_meta: %{optional(atom) => term},
        failed_nodes: [node],
        remote_errors: [term]
      }

Caveats of replicated adapter

As it is explained in the beginning, a replicated topology not only brings with advantages (mostly for reads) but also with some limitations and challenges.

This adapter uses global locks (via :global) for all operation that modify or alter the cache somehow to ensure as much consistency as possible across all members of the cluster. These locks may be per key or for the entire cache depending on the operation taking place. For that reason, it is very important to be aware about those operation that can potentally lead to performance and scalability issues, so that you can do a better usage of the replicated adapter. The following is with the operations and aspects you should pay attention to:

  • Starting and joining a new replicated node to the cluster is the most expensive action, because all write-like operations across all members of the cluster are blocked until the new node completes the synchronization process, which involves copying cached data from any of the existing cluster nodes into the new node, and this could be very expensive depending on the number of caches entries. For that reason, adding new nodes is considered an expensive operation that should happen only from time to time.

  • Deleting all entries. When Nebulex.Cache.delete_all/2 action is executed, like in the previous case, all write-like operations in all members of the cluster are blocked until the deletion action is completed (this implies deleting all cached data from all cluster nodes). Therefore, deleting all entries from cache is also considered an expensive operation that should happen only from time to time.

  • Write-like operations based on a key only block operations related to that key across all members of the cluster. This is not as critical as the previous two cases but it is something to keep in mind anyway because if there is a highly demanded key in terms of writes, that could be also a potential bottleneck.

Summing up, the replicated cache topology along with this adapter should be used mainly when the the reads clearly dominate over the writes (e.g.: Reads 80% and Writes 20% or less). Besides, operations like deleting all entries from cache or adding new nodes must be executed only once in a while to avoid performance issues, since they are very expensive.

Link to this section Summary

Functions

Helper function to use dynamic cache for internal primary cache storage when needed.

Link to this section Functions

Link to this function

with_dynamic_cache(map, action, args)

View Source

Helper function to use dynamic cache for internal primary cache storage when needed.