Nebulex.Adapters.DiskLFU (Nebulex.Adapters.DiskLFU v3.0.0)

Copy Markdown View Source

Nebulex.Adapters.DiskLFU is a persistent LFU (Least Frequently Used) cache adapter for Nebulex, designed to provide an SSD-backed cache with disk persistence, TTL support, and LFU-based eviction.

This adapter is ideal for workloads that require:

  • High-capacity caching without exhausting memory.
  • File-based persistence with cache recovery after restarts.
  • Concurrency-safe operations for both reads and writes.
  • Customizable eviction strategies.

For example, imagine an application that downloads large files from S3 to process them. These files are reusable across different operations or requests. In such cases, it can be significantly more efficient to cache the files locally—ideally on an SSD—rather than repeatedly fetching them from S3. Using Nebulex.Adapters.DiskLFU, these files can be stored and accessed from the local file system with LFU eviction and TTL handling, reducing latency and cloud egress costs.

See the Architecture document for more information.

Features

  • LFU Eviction - Least Frequently Used eviction when disk capacity is exceeded.
  • TTL Support - Per-entry time-to-live with lazy and proactive cleanup.
  • Proactive Eviction - Automatic periodic cleanup of expired entries via :eviction_timeout.
  • Manual Cleanup - Direct API for explicit expired entry removal with delete_all(query: :expired).
  • Concurrent Access - Safe read/write operations with atomic guarantees per key.
  • Persistent - Survives application restarts with fast recovery from disk.

Usage

Define your cache module:

defmodule MyApp.Cache do
  use Nebulex.Cache,
    otp_app: :my_app,
    adapter: Nebulex.Adapters.DiskLFU
end

Configure your cache in config/config.exs:

config :my_app, MyApp.Cache,
  root_path: "/var/cache",
  max_bytes: 10_000_000,               # 10MB capacity
  eviction_timeout: :timer.minutes(5)  # Clean expired entries every 5 minutes

Add the cache to your application supervision tree:

def start(_type, _args) do
  children = [
    {MyApp.Cache, []},
    # ... other children
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end

Then use it in your application:

# Write a value with TTL
MyApp.Cache.put(:key, "value", ttl: :timer.hours(1))

# Read a value
MyApp.Cache.get(:key)

# Delete expired entries manually
MyApp.Cache.delete_all(query: :expired)

Startup options

The following options are available for the adapter at startup:

  • :root_path (String.t/0) - Required. The root directory where cache files are stored.

    Cache files (.cache and .meta files) are created as direct children of this directory. This directory will be created if it does not exist.

  • :max_bytes - The maximum cache size in bytes. When exceeded, LFU eviction is triggered to remove the least frequently used entries until the size falls below this limit.

    When nil (default), the cache has no size-based eviction limit. Note that TTL-based expiration still applies if entries have :ttl set.

    The default value is nil.

  • :eviction_victim_limit (pos_integer/0) - The maximum number of entries to evict in a single operation.

    When size limit is exceeded, eviction selects up to this many entries (victims) to delete based on the LFU strategy.

    The default value is 100.

  • :eviction_victim_sample_size (pos_integer/0) - The number of candidate entries to sample when selecting victims.

    A larger sample size provides better eviction decisions (more accurate LFU) but requires more scanning. The sampled entries are then sorted by frequency and age, and the worst :eviction_victim_limit entries are removed.

    The default value is 1000.

  • :metadata_persistence_timeout - The interval in milliseconds at which to persist metadata to disk.

    Metadata is updated in memory for performance, then periodically written to disk at this interval to minimize I/O overhead. When nil, metadata is not persisted periodically, but is still persisted during graceful shutdown to ensure consistency on restart.

    The default value is 60000.

  • :eviction_timeout - The interval in milliseconds to evict expired entries periodically.

    When set, a background timer triggers automatic cleanup of all expired entries at the specified interval. When nil (default), expired entries are removed lazily (on access) or manually via delete_all(query: :expired).

    Example:

    config :my_app, MyApp.Cache,
      root_path: "/tmp/my_cache",
      eviction_timeout: :timer.minutes(5)

    The default value is nil.

Shared runtime options

The following options are available for all operations:

  • :retries - The maximum number of times to retry an operation when blocked by locks.

    Operations retry when another process holds a lock on the same key or resource. Set to a non-negative integer for a maximum retry count, or :infinity (default) to retry indefinitely until the lock is released.

    The default value is :infinity.

Read options

The following options are available for the read operations (e.g., fetch, get, take):

  • :return - The value to return from read operations (fetch, get, take).

    Supported values:

    • :binary (default) - Returns the cache value as a binary.
    • :metadata - Returns the entry metadata instead of the value.
    • :symlink - Returns a temporary read-only symlink to the cached file. Only supported for fetch and get operations. The symlink is automatically cleaned up after use. Useful for passing large files to external tools without loading them into memory. Do not modify the file.
    • A function/2 - A callback receiving (binary, metadata) that transforms and returns a custom value.

    The default value is :binary.

Write options

The following options are available for the write operations (e.g., put, put_new, replace, put_all, put_new_all):

  • :metadata (map/0) - Custom metadata to attach to the cached entry.

    This data is stored alongside the value and can be retrieved using the return: :metadata option on read operations.

    The default value is %{}.

Adapter-specific telemetry events

This adapter exposes the following Telemetry events grouped by category:

Eviction Events

  • telemetry_prefix ++ [:eviction, :start] - Dispatched when eviction begins.

    • Measurements: %{system_time: non_neg_integer()}

    • Metadata:

      %{
        stored_bytes: non_neg_integer(),
        max_bytes: non_neg_integer(),
        victim_sample_size: non_neg_integer(),
        victim_limit: non_neg_integer()
      }
  • telemetry_prefix ++ [:eviction, :stop] - Dispatched when eviction completes.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        stored_bytes: non_neg_integer(),
        max_bytes: non_neg_integer(),
        victim_sample_size: non_neg_integer(),
        victim_limit: non_neg_integer(),
        result: term()
      }
  • telemetry_prefix ++ [:eviction, :exception] - Dispatched when eviction fails.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        stored_bytes: non_neg_integer(),
        max_bytes: non_neg_integer(),
        victim_sample_size: non_neg_integer(),
        victim_limit: non_neg_integer(),
        kind: :error | :exit | :throw,
        reason: term(),
        stacktrace: [term()]
      }

Expired Entry Eviction Events

  • telemetry_prefix ++ [:evict_expired_entries, :start] - Dispatched when the periodic background timer begins evicting expired entries.

    • Measurements: %{system_time: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid()
      }
  • telemetry_prefix ++ [:evict_expired_entries, :stop] - Dispatched when expired entry eviction completes.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid(),
        count: non_neg_integer()
      }
  • telemetry_prefix ++ [:evict_expired_entries, :exception] - Dispatched when expired entry eviction fails.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid(),
        kind: :error | :exit | :throw,
        reason: term(),
        stacktrace: [term()]
      }

Metadata Persistence Events

  • telemetry_prefix ++ [:persist_meta, :start] - Dispatched when metadata persistence begins.

    • Measurements: %{system_time: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid()
      }
  • telemetry_prefix ++ [:persist_meta, :stop] - Dispatched when metadata persistence completes.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid(),
        count: non_neg_integer()
      }
  • telemetry_prefix ++ [:persist_meta, :exception] - Dispatched when metadata persistence fails.

    • Measurements: %{duration: non_neg_integer()}

    • Metadata:

      %{
        store_pid: pid(),
        kind: :error | :exit | :throw,
        reason: term(),
        stacktrace: [term()]
      }

Metadata Loading Events

  • telemetry_prefix ++ [:load_metadata, :error] - Dispatched when metadata loading fails for a file.

    • Measurements: %{system_time: non_neg_integer()}

    • Metadata:

      %{
        filename: String.t(),
        reason: term()
      }

Queryable API

This adapter supports the Nebulex.Adapter.Queryable behaviour with a limited query interface. The following query options are available for queryable operations like delete_all/2, count_all/2, and get_all/2:

Query Option: Match All Keys

Delete, count, or retrieve all keys in the cache:

MyCache.delete_all()
MyCache.count_all()
MyCache.get_all()

Query Option: Match Specific Keys

Delete, count, or retrieve specific keys using the in operator:

MyCache.delete_all(in: [:key1, :key2, :key3])
MyCache.count_all(in: [:key1, :key2])
MyCache.get_all(in: [:key1, :key2, :key3])

Query Option: Match Expired Entries

Delete expired entries using query: :expired:

MyCache.delete_all(query: :expired)

This query matches all entries whose TTL (expires_at) is less than or equal to the current time. It is only supported for delete_all/2 and is particularly useful for proactive cleanup of stale entries, either manually via the API or automatically by configuring the :eviction_timeout option at startup.

For more information about automatic eviction, see the Architecture guide.

Limitations and Considerations

Unsupported Operations

  • incr/3 and decr/3 are not supported.
  • put_new/3, replace/3, and put_new_all/2 are not supported. They work as put operations instead. Support is planned for a future release.
  • The Nebulex.Adapters.Common.Info behaviour is not implemented. Support for cache introspection and statistics is planned for a future release.
  • The Nebulex.Adapter.Observable behaviour is not implemented. Support for cache entry events is planned for a future release.

Query Operation Limitations

  • count_all/1 supports counting all keys or given keys, but is not atomic. Errors are skipped and the count may be inaccurate.
  • delete_all/1 supports deleting all keys, given keys, or expired entries (query: :expired), but is not atomic. Errors are skipped and deletion may be incomplete.
  • get_all/1 supports retrieving all keys or given keys only.
  • stream/1 supports streaming all keys or given keys only.

Performance Characteristics

  • Write and delete operations (put, put_all, delete, take) are blocking and atomic per key. This ensures consistency and prevents race conditions or write conflicts.
  • Read operations may block briefly if a key is expired and requires cleanup from the cache.

Summary

Types

The return function for the fetch operation.

Types

return_fn()

@type return_fn() :: (binary(), Nebulex.Adapters.DiskLFU.Meta.t() -> any())

The return function for the fetch operation.