View Source Nebulex.Adapters.Local (Nebulex v2.6.4)

Adapter module for Local Generational Cache; inspired by epocxy.

Generational caching using an ets table (or multiple ones when used with :shards) for each generation of cached data. Accesses hit the newer generation first, and migrate from the older generation to the newer generation when retrieved from the stale table. When a new generation is started, the oldest one is deleted. This is a form of mass garbage collection which avoids using timers and expiration of individual cached elements.

This implementation of generation cache uses only two generations (which is more than enough) also referred like the newer and the older.

Overall features

  • Configurable backend (ets or :shards).
  • Expiration – A status based on TTL (Time To Live) option. To maintain cache performance, expired entries may not be immediately removed or evicted, they are expired or evicted on-demand, when the key is read.
  • Eviction – Generational Garbage Collection.
  • Sharding – For intensive workloads, the Cache may also be partitioned (by using :shards backend and specifying the :partitions option).
  • Support for transactions via Erlang global name registration facility.
  • Support for stats.

Options

This adapter supports the following options and all of them can be given via the cache configuration:

  • :backend - Defines the backend or storage to be used for the adapter. Supported backends are: :ets and :shards. Defaults to :ets.

  • :read_concurrency - (boolean) Since this adapter uses ETS tables internally, this option is used when a new table is created; see :ets.new/2. Defaults to true.

  • :write_concurrency - (boolean) Since this adapter uses ETS tables internally, this option is used when a new table is created; see :ets.new/2. Defaults to true.

  • :compressed - (boolean) This option is used when a new ETS table is created and it defines whether or not it includes X as an option; see :ets.new/2. Defaults to false.

  • :backend_type - This option defines the type of ETS to be used (Defaults to :set). However, it is highly recommended to keep the default value, since there are commands not supported (unexpected exception may be raised) for types like :bag or : duplicate_bag. Please see the ETS docs for more information.

  • :partitions - If it is set, an integer > 0 is expected, otherwise, it defaults to System.schedulers_online(). This option is only available for :shards backend.

  • :gc_interval - If it is set, an integer > 0 is expected defining the interval time in milliseconds to garbage collection to run, delete the oldest generation and create a new one. If this option is not set, garbage collection is never executed, so new generations must be created explicitly, e.g.: MyCache.new_generation(opts).

  • :max_size - If it is set, an integer > 0 is expected defining the max number of cached entries (cache limit). If it is not set (nil), the check to release memory is not performed (the default).

  • :allocated_memory - If it is set, an integer > 0 is expected defining the max size in bytes allocated for a cache generation. When this option is set and the configured value is reached, a new cache generation is created so the oldest is deleted and force releasing memory space. If it is not set (nil), the cleanup check to release memory is not performed (the default).

  • :gc_cleanup_min_timeout - An integer > 0 defining the min timeout in milliseconds for triggering the next cleanup and memory check. This will be the timeout to use when either the max size or max allocated memory is reached. Defaults to 10_000 (10 seconds).

  • :gc_cleanup_max_timeout - An integer > 0 defining the max timeout in milliseconds for triggering the next cleanup and memory check. This is the timeout used when the cache starts and there are few entries or the consumed memory is near to 0. Defaults to 600_000 (10 minutes).

  • :gc_flush_delay - If it is set, an integer > 0 is expected defining the delay in milliseconds before objects from the oldest generation are flushed. Defaults to 10_000 (10 seconds).

Usage

Nebulex.Cache is the wrapper around the cache. We can define a local cache as follows:

defmodule MyApp.LocalCache do
  use Nebulex.Cache,
    otp_app: :my_app,
    adapter: Nebulex.Adapters.Local
end

Where the configuration for the cache must be in your application environment, usually defined in your config/config.exs:

config :my_app, MyApp.LocalCache,
  gc_interval: :timer.hours(12),
  max_size: 1_000_000,
  allocated_memory: 2_000_000_000,
  gc_cleanup_min_timeout: :timer.seconds(10),
  gc_cleanup_max_timeout: :timer.minutes(10)

For intensive workloads, the Cache may also be partitioned using :shards as cache backend (backend: :shards) and configuring the desired number of partitions via the :partitions option. Defaults to System.schedulers_online().

config :my_app, MyApp.LocalCache,
  gc_interval: :timer.hours(12),
  max_size: 1_000_000,
  allocated_memory: 2_000_000_000,
  gc_cleanup_min_timeout: :timer.seconds(10),
  gc_cleanup_max_timeout: :timer.minutes(10),
  backend: :shards,
  partitions: System.schedulers_online() * 2

If your application was generated with a supervisor (by passing --sup to mix new) you will have a lib/my_app/application.ex file containing the application start callback that defines and starts your supervisor. You just need to edit the start/2 function to start the cache as a supervisor on your application's supervisor:

def start(_type, _args) do
  children = [
    {MyApp.LocalCache, []},
    ...
  ]

See Nebulex.Cache for more information.

Eviction configuration

This section is to understand a bit better how the different configuration options work and have an idea what values to set; especially if it is the first time using Nebulex.

:ttl option

The :ttl option that is used to set the expiration time for a key, it doesn't work as eviction mechanism, since the local adapter implements a generational cache, the options that control the eviction process are: :gc_interval, :gc_cleanup_min_timeout, :gc_cleanup_max_timeout, :max_size and :allocated_memory. The :ttl is evaluated on-demand when a key is retrieved, and at that moment if it s expired, then remove it from the cache, hence, it can not be used as eviction method, it is more for keep the integrity and consistency in the cache. For this reason, it is highly recommended to configure always the eviction options mentioned before.

Caveats when using :ttl option:

  • When using the :ttl option, ensure it is less than :gc_interval, otherwise, there may be a situation where the key is evicted and the :ttl hasn't happened yet (maybe because the garbage collector ran before the key had been fetched).
  • Assuming you have :gc_interval set to 2 hrs, then you put a new key with :ttl set to 1 hr, and 1 minute later the GC runs, that key will be moved to the older generation so it can be yet retrieved. On the other hand, if the key is never fetched till the next GC cycle (causing moving it to the newer generation), since the key is already in the oldest generation it will be evicted from the cache so it won't be retrievable anymore.

Garbage collection or eviction options

This adapter implements a generational cache, which means its main eviction mechanism is pushing a new cache generation and remove the oldest one. In this way, we ensure only the most frequently used keys are always available in the newer generation and the the least frequently used are evicted when the garbage collector runs, and the garbage collector is triggered upon these conditions:

  • When the time interval defined by :gc_interval is completed. This makes the garbage-collector process to run creating a new generation and forcing to delete the oldest one.
  • When the "cleanup" timeout expires, and then the limits :max_size and :allocated_memory are checked, if one of those is reached, then the garbage collector runs (a new generation is created and the oldest one is deleted). The cleanup timeout is controlled by :gc_cleanup_min_timeout and :gc_cleanup_max_timeout, it works with an inverse linear backoff, which means the timeout is inverse proportional to the memory growth; the bigger the cache size is, the shorter the cleanup timeout will be.

First-time configuration

For configuring the cache with accurate and/or good values it is important to know several things in advance, like for example the size of an entry in average so we can calculate a good value for max size and/or allocated memory, how intensive will be the load in terms of reads and writes, etc. The problem is most of these aspects are unknown when it is a new app or we are using the cache for the first time. Therefore, the following recommendations will help you to configure the cache for the first time:

  • When configuring the :gc_interval, think about how that often the least frequently used entries should be evicted, or what is the desired retention period for the cached entries. For example, if :gc_interval is set to 1 hr, it means you will keep in cache only those entries that are retrieved periodically within a 2 hr period; gc_interval * 2, being 2 the number of generations. Longer than that, the GC will ensure is always evicted (the oldest generation is always deleted). If it is the first time using Nebulex, perhaps you can start with gc_interval: :timer.hours(12) (12 hrs), so the max retention period for the keys will be 1 day; but ensure you also set either the :max_size or :allocated_memory.
  • It is highly recommended to set either :max_size or :allocated_memory to ensure the oldest generation is deleted (least frequently used keys are evicted) when one of these limits is reached and also to avoid running out of memory. For example, for the :allocated_memory we can set 25% of the total memory, and for the :max_size something between 100_000 and 1_000_000.
  • For :gc_cleanup_min_timeout we can set 10_000, which means when the cache is reaching the size or memory limit, the polling period for the cleanup process will be 10 seconds. And for :gc_cleanup_max_timeout we can set 600_000, which means when the cache is almost empty the polling period will be close to 10 minutes.

Stats

This adapter does support stats by using the default implementation provided by Nebulex.Adapter.Stats. The adapter also uses the Nebulex.Telemetry.StatsHandler to aggregate the stats and keep them updated. Therefore, it requires the Telemetry events are emitted by the adapter (the :telemetry option should not be set to false so the Telemetry events can be dispatched), otherwise, stats won't work properly.

Queryable API

Since this adapter is implemented on top of ETS tables, the query must be a valid match spec given by :ets.match_spec(). However, there are some predefined and/or shorthand queries you can use. See the section "Predefined queries" below for for information.

Internally, an entry is represented by the tuple {:entry, key, value, touched, ttl}, which means the match pattern within the :ets.match_spec() must be something like: {:entry, :"$1", :"$2", :"$3", :"$4"}. In order to make query building easier, you can use Ex2ms library.

Predefined queries

  • nil - All keys are returned.

  • :unexpired - All unexpired keys/entries.

  • :expired - All expired keys/entries.

  • {:in, [term]} - Only the keys in the given key list ([term]) are returned. This predefined query is only supported for Nebulex.Cache.delete_all/2. This is the recommended way of doing bulk delete of keys.

Examples

# built-in queries
MyCache.all()
MyCache.all(:unexpired)
MyCache.all(:expired)
MyCache.all({:in, ["foo", "bar"]})

# using a custom match spec (all values > 10)
spec = [{{:_, :"$1", :"$2", :_, :_}, [{:>, :"$2", 10}], [{{:"$1", :"$2"}}]}]
MyCache.all(spec)

# using Ex2ms
import Ex2ms

spec =
  fun do
    {_, key, value, _, _} when value > 10 -> {key, value}
  end

MyCache.all(spec)

The :return option applies only for built-in queries, such as: nil | :unexpired | :expired, if you are using a custom :ets.match_spec(), the return value depends on it.

The same applies to the stream function.

Extended API (convenience functions)

This adapter provides some additional convenience functions to the Nebulex.Cache API.

Creating new generations:

MyCache.new_generation()
MyCache.new_generation(reset_timer: false)

Retrieving the current generations:

MyCache.generations()

Retrieving the newer generation:

MyCache.newer_generation()

Summary

Functions

Link to this macro

entry(args \\ [])

View Source (macro)
Link to this macro

entry(record, args)

View Source (macro)
Link to this function

take_(adapter_meta, key)

View Source