Developer Guide for SuperCache

Copy Markdown View Source

Clone repo

Setup

mix deps.get

For using Tidewave (live reload during development):

mix tidewave

Go to http://localhost:4000/tidewave

Using as Local Dependency

In your project's mix.exs:

defp deps do
  base_dir = "/your/base/path"

  [
    {:super_cache, path: Path.join(base_dir, "super_cache")}
    
    # or using git repo, using override: true if you want to override in sub deps.
    # {:super_cache, git: "https://github.com/your_account/super_cache", override: true}
  ]
end

Architecture Overview

SuperCache is built on a layered architecture designed for high performance and horizontal scalability. The codebase contains 34 modules organized into 7 layers:


                    Application Layer                          
  SuperCache  KeyValue  Queue  Stack  Struct               

                          

                    Routing Layer                              
  Partition Router (local)  Cluster Router (distributed)     
  Cluster.DistributedStore (shared helpers)                    

                          

                  Replication Layer                            
  Replicator (async/sync)  WAL (strong)  ThreePhaseCommit   

                          

                   Storage Layer                               
  Storage (ETS wrapper)  EtsHolder (table lifecycle)         
  Partition (hashing)  Partition.Holder (registry)           



                Cluster Infrastructure                         
  Manager  NodeMonitor  HealthMonitor  Metrics  Stats     
  TxnRegistry  Router                                        



                Buffer System (lazy_put)                       
  Buffer (scheduler-affine)  Internal.Queue  Internal.Stream

Complete Module Reference

1. Core API Modules (lib/api/)

ModulePathResponsibility
SuperCachelib/api/super_cache.exMain public API for tuple storage. Handles local/distributed modes transparently. Provides put/get/delete/scan operations with bang and safe variants.
SuperCache.KeyValuelib/api/key_value.exKey-value namespace API with batch operations. Multiple independent namespaces coexist using different kv_name values.
SuperCache.Queuelib/api/queue.exNamed FIFO queues backed by ETS partitions. Supports add/out/peak/count/get_all operations.
SuperCache.Stacklib/api/stack.exNamed LIFO stacks backed by ETS partitions. Supports push/pop/count/get_all operations.
SuperCache.Structlib/api/struct.exStruct storage with automatic key extraction. Call init/2 once per struct type before using.

2. Application & Bootstrap (lib/)

ModulePathResponsibility
SuperCache.Applicationlib/application.exOTP Application callback. Starts the core supervision tree. Auto-starts cache if config :super_cache, auto_start: true. Connects to cluster peers on startup.
SuperCache.Bootstraplib/bootstrap.exUnified startup/shutdown for both :local and :distributed modes. Validates options, resolves defaults, starts components in dependency order.
SuperCache.Configlib/app/config.exGenServer-backed configuration store with :persistent_term optimization for hot-path keys (:cluster, :key_pos, :partition_pos, :num_partition, :table_type, :table_prefix).
SuperCache.Suplib/app/sup.exDynamic supervisor for user-spawned workers and dynamically allocated resources.
SuperCache.Loglib/debug_log.exZero-cost conditional logging macro. Debug output suppressed by default. Expands to :ok when debug_log: false.

3. Buffer System (lib/buffer/)

ModulePathResponsibility
SuperCache.Bufferlib/buffer/buffer.exManages per-scheduler write buffers for lazy_put/1. One buffer process per online scheduler with scheduler affinity via :erlang.system_info(:scheduler_id).
SuperCache.Internal.Queuelib/buffer/queue.exInternal concurrent queue for buffer streams. Supports multiple producers/consumers with graceful shutdown.
SuperCache.Internal.Streamlib/buffer/stream.exBridges internal queue with caching layer. Creates a Stream that pulls items and pushes them into cache.

4. Partition System (lib/partition/)

ModulePathResponsibility
SuperCache.Partitionlib/partition/partition_api.exHandles partition hashing, resolution, and lifecycle. Uses :erlang.phash2/2 for hashing. Provides get_partition/1, get_partition_order/1, and partition enumeration.
SuperCache.Partition.Holderlib/partition/partition_holder.exGenServer-backed registry mapping partition indices to ETS table atoms. Uses :protected ETS table for lock-free reads.

5. Storage Layer (lib/storage/)

ModulePathResponsibility
SuperCache.Storagelib/storage/storage_api.exThin ETS wrapper providing read/write/delete primitives. All tables created with :write_concurrency and :read_concurrency. Supports put, get, get_by_match, get_by_match_object, scan, take, delete, delete_match, update_counter, update_element.
SuperCache.EtsHolderlib/storage/ets_holder.exGenServer that owns the lifecycle of all ETS tables. Tables are automatically deleted on shutdown.

6. Cluster System (lib/cluster/)

ModulePathResponsibility
SuperCache.Cluster.Bootstraplib/cluster/cluster_bootstrap.exDistributed mode bootstrap. Handles node connection, config verification across peers, partition map building, and component initialization.
SuperCache.Cluster.Managerlib/cluster/manager.exCluster membership and partition → primary/replica mapping. Stores partition map in :persistent_term for zero-cost reads. Partition assignment: sorted node list rotated by partition index.
SuperCache.Cluster.NodeMonitorlib/cluster/node_monitor.exMonitors declared nodes and notifies Manager when they join/leave. Supports three sources: static :nodes, dynamic :nodes_mfa, or legacy all-node watching.
SuperCache.Cluster.HealthMonitorlib/cluster/health_monitor.exContinuous health checking via periodic checks of connectivity (RTT), replication lag (probe-based), partition balance, and error rates. Emits :telemetry events.
SuperCache.Cluster.Routerlib/cluster/router.exDistributed request router. Routes reads/writes to correct nodes. Handles read-your-writes consistency, quorum reads with early termination, and primary routing.
SuperCache.Cluster.Replicatorlib/cluster/replicator.exReplication engine with three modes: :async (fire-and-forget via Task.Supervisor pool), :sync (adaptive quorum), :strong (WAL-based). Handles bulk partition transfers.
SuperCache.Cluster.WALlib/cluster/wal.exWrite-Ahead Log for strong consistency. Replaces heavy 3PC with ~200µs latency. Write to local ETS → append to WAL → async replicate → return on majority ack.
SuperCache.Cluster.ThreePhaseCommitlib/cluster/three_phase_commit.exLegacy three-phase commit protocol (PREPARE → PRE_COMMIT → COMMIT). Replaced by WAL but still available for backwards compatibility.
SuperCache.Cluster.TxnRegistrylib/cluster/tnx_registry.exIn-memory transaction log for 3PC protocol. Uses :public ETS table for lock-free reads. Tracks transaction states: :prepared:pre_committed:committed/:aborted.
SuperCache.Cluster.Metricslib/cluster/metrics.exLow-overhead counter and latency sample store. Uses :public ETS table with atomic update_counter/3. Ring buffer for latency samples (max 256).
SuperCache.Cluster.Statslib/cluster/stats.exGenerates cluster overview, partition maps, 3PC metrics, and API call statistics. Provides pretty-printing for console output.
SuperCache.Cluster.DistributedStorelib/cluster/distributed_store.exShared routing helpers used by all distributed high-level stores (KeyValue, Queue, Stack, Struct). Provides route_put, route_delete, local_get, local_match, etc.
SuperCache.Cluster.DistributedHelperslib/cluster/distributed_helpers.exShared distributed read/write helpers extracted from KeyValue, Queue, Stack, and Struct. Provides distributed?/0, apply_write/3, route_write/4, route_read/5, has_partition?/1, read_primary/4, read_quorum/4. Eliminates ~400 lines of code duplication across API modules.

7. Compatibility Shims (lib/distributed/)

All modules in lib/distributed/ are deprecated backwards-compatibility shims that delegate to the unified modules:

Shim ModuleDelegates To
SuperCache.DistributedSuperCache
SuperCache.Distributed.KeyValueSuperCache.KeyValue
SuperCache.Distributed.QueueSuperCache.Queue
SuperCache.Distributed.StackSuperCache.Stack
SuperCache.Distributed.StructSuperCache.Struct

Performance Optimizations

The codebase includes several performance optimizations:

  1. Compile-time log eliminationSuperCache.Log.debug/1 expands to :ok when debug_log: false
  2. Partition resolution inlining@compile {:inline, get_partition: 1} eliminates function call overhead
  3. Config.distributed?/0 inlining@compile {:inline, distributed?: 0} for zero-cost cluster-mode checks on the hot path (called by every API operation)
  4. Batch ETS operations:ets.insert/2 with lists instead of per-item calls
  5. Async replication worker poolTask.Supervisor eliminates per-operation spawn/1 overhead
  6. Adaptive quorum writes — Sync mode returns on majority ack, not all replicas
  7. Quorum read early termination — Stops waiting once majority is reached; kills remaining tasks immediately
  8. WAL-based strong consistency — Replaces 3PC with fast local write + async replication + majority ack (~200µs vs ~1500µs)
  9. Atomic WAL sequence generation:ets.update_counter/4 for race-condition-free sequence numbers
  10. Persistent-term config — Hot-path config keys served from :persistent_term for O(1) access
  11. Scheduler-affine bufferslazy_put/1 routes to buffer on same scheduler via :erlang.system_info(:scheduler_id)
  12. Protected ETS tables — Partition.Holder uses :protected ETS for lock-free reads
  13. Spin-wait optimization:erlang.yield/0 instead of :timer.sleep/1 in Queue/Stack lock contention loops
  14. Shared distributed helpersDistributedHelpers module eliminates code duplication across KeyValue, Queue, Stack, and Struct

Usage Guide in Plain IEx (Single Cluster, Two Nodes)

Start Cluster

Terminal 1:

iex --name node1@127.0.0.1 --cookie need_to_change_this -S mix

Terminal 2:

iex --name node2@127.0.0.1 --cookie need_to_change_this -S mix

Bootstrap and Test

# --- node1@127.0.0.1 ---

# Connect to peer
Node.connect(:"node2@127.0.0.1")

# Start cache (if :auto_start is missed or false)
SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0, partition_pos: 0,
  cluster: :distributed,
  replication_mode: :strong,  # Try :async, :sync, or :strong
  replication_factor: 2,
  num_partition: 4
)

# Verify partition map is built
SuperCache.Cluster.Manager.live_nodes()
#=> [:"node1@127.0.0.1", :"node2@127.0.0.1"]

# Check cluster stats
SuperCache.cluster_stats()

# Write data
SuperCache.put!({:user, 1, "Alice"})
SuperCache.put!({:user, 2, "Bob"})

# Check record counts per partition
SuperCache.stats()
# --- node2@127.0.0.1 ---

# Verify node1 is joined
Node.list()
#=> [:"node1@127.0.0.1"]

# Start cache (must match node1 config)
SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0, partition_pos: 0,
  cluster: :distributed,
  replication_mode: :strong,
  replication_factor: 2,
  num_partition: 4
)

# Read replicated data
SuperCache.get!({:user, 1})
#=> [{:user, 1, "Alice"}]

# Read with different consistency levels
SuperCache.get!({:user, 1}, read_mode: :local)    # Fast, may be stale
SuperCache.get!({:user, 1}, read_mode: :primary)  # Consistent with primary
SuperCache.get!({:user, 1}, read_mode: :quorum)   # Majority agreement

Testing

Run All Tests

# All tests (includes cluster tests)
mix test

# Unit tests only — no distribution needed
mix test --exclude cluster

# Specific test file
mix test test/kv_test.exs

# Specific test with line number
mix test test/kv_test.exs:42

# With warnings as errors
mix test --warnings-as-errors

# Limit failures
mix test --max-failures 3

Cluster Tests

# Run only cluster tests
mix test test/cluster/

# Run with specific seed for reproducibility
mix test test/cluster/ --seed 12345

# Run via alias (sets up proper VM flags)
mix test.cluster

Note: Cluster tests can be flaky due to timing and shared node config. Use --exclude cluster for fast local development.

Test Structure

DirectoryTestsDescription
test/9 filesCore single-node tests (ETS, KV, Queue, Stack, Struct, Partition, Storage)
test/cluster/9 filesCluster integration tests (bootstrap, node failure, health monitor, 3PC, RYW)
test/distributed/6 filesDistributed API tests (batch writes, KV, main API, Queue, Stack, Struct)
test/support/1 fileClusterCase - Shared ExUnit case template for cluster tests

Testing WAL Consistency

# Start with strong mode
SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0, partition_pos: 0,
  cluster: :distributed,
  replication_mode: :strong,
  replication_factor: 2,
  num_partition: 4
)

# Write data
SuperCache.put!({:wal_test, 1, "data"})

# Check WAL stats
SuperCache.Cluster.WAL.stats()
#=> %{pending: 0, acks_pending: 0}

# Test recovery (simulate restart)
SuperCache.Cluster.WAL.recover()

Testing Replication Modes

# Test async mode (fire-and-forget)
SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0, partition_pos: 0,
  cluster: :distributed,
  replication_mode: :async,
  replication_factor: 2,
  num_partition: 4
)

# Test sync mode (adaptive quorum)
SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0, partition_pos: 0,
  cluster: :distributed,
  replication_mode: :sync,
  replication_factor: 2,
  num_partition: 4
)

Benchmark & Profiling

Built-in Benchmark Tools

Scripts are located in tools/:

# Performance benchmark
mix run tools/performance.exs

# Profiling with fprof
mix run tools/profile.exs

Quick Benchmarks in IEx

# Start cache
SuperCache.start!(key_pos: 0, partition_pos: 1, num_partition: 4)

# Benchmark put!
n = 100_000
{time_us, _} = :timer.tc(fn ->
  Enum.each(1..n, fn i -> SuperCache.put!({:user, i, "data"}) end)
end)
ops_per_sec = n / (time_us / 1_000_000)
IO.puts("put! #{n} ops in #{time_us}µs = #{round(ops_per_sec)} ops/sec")

# Benchmark get!
{time_us, _} = :timer.tc(fn ->
  Enum.each(1..n, fn i -> SuperCache.get!({:user, i}) end)
end)
ops_per_sec = n / (time_us / 1_000_000)
IO.puts("get! #{n} ops in #{time_us}µs = #{round(ops_per_sec)} ops/sec")

# Benchmark batch operations
alias SuperCache.KeyValue
{time_us, _} = :timer.tc(fn ->
  KeyValue.add_batch("bench", Enum.map(1..10_000, fn i -> {i, "val"} end))
end)
ops_per_sec = 10_000 / (time_us / 1_000_000)
IO.puts("KeyValue.add_batch 10k in #{time_us}µs = #{round(ops_per_sec)} ops/sec")

SuperCache.stop()

Expected Performance (Local Mode, 4 partitions)

OperationThroughputNotes
put!~1.2M ops/sec~33% overhead vs raw ETS
get!~2.1M ops/secNear raw ETS speed
KeyValue.add_batch (10k)~1.1M ops/secSingle ETS insert

Distributed Latency (Typical LAN)

OperationAsyncSync (Quorum)Strong (WAL)
Write~50-100µs~100-300µs~200µs
Read (local)~10µs~10µs~10µs
Read (quorum)~100-200µs~100-200µs~100-200µs

Profiling with :fprof

# Start profiling
:fprof.apply(SuperCache, :put!, [{:user, 1, "Alice"}])
:fprof.profile()
:fprof.analyse()

Memory Profiling

# Check ETS table sizes
:ets.all()
|> Enum.map(fn table -> {table, :ets.info(table, :size)} end)
|> Enum.sort_by(fn {_, size} -> size end, :desc)
|> Enum.take(10)

Debugging

Enable Debug Logging

Compile-time (zero overhead when disabled):

# config/config.exs
config :super_cache, debug_log: true

Runtime:

SuperCache.Log.enable(true)
SuperCache.Log.enable(false)

Inspect Internal State

# Check partition configuration
SuperCache.Config.get_config(:num_partition)
SuperCache.Config.get_config(:key_pos)
SuperCache.Config.get_config(:partition_pos)

# Check cluster state
SuperCache.Cluster.Manager.live_nodes()
SuperCache.Cluster.Manager.get_replicas(0)

# Check WAL state
SuperCache.Cluster.WAL.stats()

# Check health metrics
SuperCache.Cluster.HealthMonitor.cluster_health()
SuperCache.Cluster.HealthMonitor.partition_balance()

# Check cache statistics
SuperCache.stats()
SuperCache.cluster_stats()

# Check API metrics
SuperCache.Cluster.Stats.api()
SuperCache.Cluster.Stats.partitions()

Common Issues

"tuple size is lower than key_pos" — Ensure tuples have enough elements for the configured key_pos.

"Partition count mismatch" — All nodes must have the same num_partition value.

"Replication lag increasing" — Check network connectivity, verify no GC pauses, use HealthMonitor.cluster_health().

"Quorum reads timing out" — Ensure majority of nodes are reachable, check :erpc connectivity.

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Code Style

  • Run formatter: mix format
  • Check for warnings: mix compile --warnings-as-errors
  • Run tests: mix test --exclude cluster

Adding New Features

  1. Add tests first (TDD approach)
  2. Implement the feature
  3. Update documentation in README.md and relevant guides
  4. Update @moduledoc and @doc strings
  5. Run full test suite: mix test

Module Design Guidelines

  • Public API functions — Use @doc with examples. Provide both bang (!) and safe variants for operations that can fail.
  • erpc entry points — Use @doc false for functions called across nodes via :erpc. These must be public but are not part of the user-facing API.
  • Local implementations — Prefix with do_local_* to avoid collision with @doc false erpc entry points.
  • Distributed routing — Use Cluster.Router for routing, Cluster.Replicator for replication, and Cluster.DistributedStore for shared helpers.
  • Configuration — Store in Config GenServer. Use :persistent_term for hot-path keys.
  • ETS tables — Always use EtsHolder for lifecycle management. Tables are auto-deleted on shutdown.