Clone repo
Setup
mix deps.get
For using Tidewave (live reload during development):
mix tidewave
Go to http://localhost:4000/tidewave
Using as Local Dependency
In your project's mix.exs:
defp deps do
base_dir = "/your/base/path"
[
{:super_cache, path: Path.join(base_dir, "super_cache")}
# or using git repo, using override: true if you want to override in sub deps.
# {:super_cache, git: "https://github.com/your_account/super_cache", override: true}
]
endArchitecture Overview
SuperCache is built on a layered architecture designed for high performance and horizontal scalability. The codebase contains 34 modules organized into 7 layers:
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ SuperCache │ KeyValue │ Queue │ Stack │ Struct │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────▼───────────────────────────────────┐
│ Routing Layer │
│ Partition Router (local) │ Cluster Router (distributed) │
│ Cluster.DistributedStore (shared helpers) │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────▼───────────────────────────────────┐
│ Replication Layer │
│ Replicator (async/sync) │ WAL (strong) │ ThreePhaseCommit │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────▼───────────────────────────────────┐
│ Storage Layer │
│ Storage (ETS wrapper) │ EtsHolder (table lifecycle) │
│ Partition (hashing) │ Partition.Holder (registry) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Cluster Infrastructure │
│ Manager │ NodeMonitor │ HealthMonitor │ Metrics │ Stats │
│ TxnRegistry │ Router │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Buffer System (lazy_put) │
│ Buffer (scheduler-affine) → Internal.Queue → Internal.Stream│
└─────────────────────────────────────────────────────────────┘Complete Module Reference
1. Core API Modules (lib/api/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache | lib/api/super_cache.ex | Main public API for tuple storage. Handles local/distributed modes transparently. Provides put/get/delete/scan operations with bang and safe variants. |
SuperCache.KeyValue | lib/api/key_value.ex | Key-value namespace API with batch operations. Multiple independent namespaces coexist using different kv_name values. |
SuperCache.Queue | lib/api/queue.ex | Named FIFO queues backed by ETS partitions. Supports add/out/peak/count/get_all operations. |
SuperCache.Stack | lib/api/stack.ex | Named LIFO stacks backed by ETS partitions. Supports push/pop/count/get_all operations. |
SuperCache.Struct | lib/api/struct.ex | Struct storage with automatic key extraction. Call init/2 once per struct type before using. |
2. Application & Bootstrap (lib/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache.Application | lib/application.ex | OTP Application callback. Starts the core supervision tree. Auto-starts cache if config :super_cache, auto_start: true. Connects to cluster peers on startup. |
SuperCache.Bootstrap | lib/bootstrap.ex | Unified startup/shutdown for both :local and :distributed modes. Validates options, resolves defaults, starts components in dependency order. |
SuperCache.Config | lib/app/config.ex | GenServer-backed configuration store with :persistent_term optimization for hot-path keys (:cluster, :key_pos, :partition_pos, :num_partition, :table_type, :table_prefix). |
SuperCache.Sup | lib/app/sup.ex | Dynamic supervisor for user-spawned workers and dynamically allocated resources. |
SuperCache.Log | lib/debug_log.ex | Zero-cost conditional logging macro. Debug output suppressed by default. Expands to :ok when debug_log: false. |
3. Buffer System (lib/buffer/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache.Buffer | lib/buffer/buffer.ex | Manages per-scheduler write buffers for lazy_put/1. One buffer process per online scheduler with scheduler affinity via :erlang.system_info(:scheduler_id). |
SuperCache.Internal.Queue | lib/buffer/queue.ex | Internal concurrent queue for buffer streams. Supports multiple producers/consumers with graceful shutdown. |
SuperCache.Internal.Stream | lib/buffer/stream.ex | Bridges internal queue with caching layer. Creates a Stream that pulls items and pushes them into cache. |
4. Partition System (lib/partition/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache.Partition | lib/partition/partition_api.ex | Handles partition hashing, resolution, and lifecycle. Uses :erlang.phash2/2 for hashing. Provides get_partition/1, get_partition_order/1, and partition enumeration. |
SuperCache.Partition.Holder | lib/partition/partition_holder.ex | GenServer-backed registry mapping partition indices to ETS table atoms. Uses :protected ETS table for lock-free reads. |
5. Storage Layer (lib/storage/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache.Storage | lib/storage/storage_api.ex | Thin ETS wrapper providing read/write/delete primitives. All tables created with :write_concurrency and :read_concurrency. Supports put, get, get_by_match, get_by_match_object, scan, take, delete, delete_match, update_counter, update_element. |
SuperCache.EtsHolder | lib/storage/ets_holder.ex | GenServer that owns the lifecycle of all ETS tables. Tables are automatically deleted on shutdown. |
6. Cluster System (lib/cluster/)
| Module | Path | Responsibility |
|---|---|---|
SuperCache.Cluster.Bootstrap | lib/cluster/cluster_bootstrap.ex | Distributed mode bootstrap. Handles node connection, config verification across peers, partition map building, and component initialization. |
SuperCache.Cluster.Manager | lib/cluster/manager.ex | Cluster membership and partition → primary/replica mapping. Stores partition map in :persistent_term for zero-cost reads. Partition assignment: sorted node list rotated by partition index. |
SuperCache.Cluster.NodeMonitor | lib/cluster/node_monitor.ex | Monitors declared nodes and notifies Manager when they join/leave. Supports three sources: static :nodes, dynamic :nodes_mfa, or legacy all-node watching. |
SuperCache.Cluster.HealthMonitor | lib/cluster/health_monitor.ex | Continuous health checking via periodic checks of connectivity (RTT), replication lag (probe-based), partition balance, and error rates. Emits :telemetry events. |
SuperCache.Cluster.Router | lib/cluster/router.ex | Distributed request router. Routes reads/writes to correct nodes. Handles read-your-writes consistency, quorum reads with early termination, and primary routing. |
SuperCache.Cluster.Replicator | lib/cluster/replicator.ex | Replication engine with three modes: :async (fire-and-forget via Task.Supervisor pool), :sync (adaptive quorum), :strong (WAL-based). Handles bulk partition transfers. |
SuperCache.Cluster.WAL | lib/cluster/wal.ex | Write-Ahead Log for strong consistency. Replaces heavy 3PC with ~200µs latency. Write to local ETS → append to WAL → async replicate → return on majority ack. |
SuperCache.Cluster.ThreePhaseCommit | lib/cluster/three_phase_commit.ex | Legacy three-phase commit protocol (PREPARE → PRE_COMMIT → COMMIT). Replaced by WAL but still available for backwards compatibility. |
SuperCache.Cluster.TxnRegistry | lib/cluster/tnx_registry.ex | In-memory transaction log for 3PC protocol. Uses :public ETS table for lock-free reads. Tracks transaction states: :prepared → :pre_committed → :committed/:aborted. |
SuperCache.Cluster.Metrics | lib/cluster/metrics.ex | Low-overhead counter and latency sample store. Uses :public ETS table with atomic update_counter/3. Ring buffer for latency samples (max 256). |
SuperCache.Cluster.Stats | lib/cluster/stats.ex | Generates cluster overview, partition maps, 3PC metrics, and API call statistics. Provides pretty-printing for console output. |
SuperCache.Cluster.DistributedStore | lib/cluster/distributed_store.ex | Shared routing helpers used by all distributed high-level stores (KeyValue, Queue, Stack, Struct). Provides route_put, route_delete, local_get, local_match, etc. |
SuperCache.Cluster.DistributedHelpers | lib/cluster/distributed_helpers.ex | Shared distributed read/write helpers extracted from KeyValue, Queue, Stack, and Struct. Provides distributed?/0, apply_write/3, route_write/4, route_read/5, has_partition?/1, read_primary/4, read_quorum/4. Eliminates ~400 lines of code duplication across API modules. |
7. Compatibility Shims (lib/distributed/)
All modules in lib/distributed/ are deprecated backwards-compatibility shims that delegate to the unified modules:
Performance Optimizations
The codebase includes several performance optimizations:
- Compile-time log elimination —
SuperCache.Log.debug/1expands to:okwhendebug_log: false - Partition resolution inlining —
@compile {:inline, get_partition: 1}eliminates function call overhead - Config.distributed?/0 inlining —
@compile {:inline, distributed?: 0}for zero-cost cluster-mode checks on the hot path (called by every API operation) - Batch ETS operations —
:ets.insert/2with lists instead of per-item calls - Async replication worker pool —
Task.Supervisoreliminates per-operationspawn/1overhead - Adaptive quorum writes — Sync mode returns on majority ack, not all replicas
- Quorum read early termination — Stops waiting once majority is reached; kills remaining tasks immediately
- WAL-based strong consistency — Replaces 3PC with fast local write + async replication + majority ack (~200µs vs ~1500µs)
- Atomic WAL sequence generation —
:ets.update_counter/4for race-condition-free sequence numbers - Persistent-term config — Hot-path config keys served from
:persistent_termfor O(1) access - Scheduler-affine buffers —
lazy_put/1routes to buffer on same scheduler via:erlang.system_info(:scheduler_id) - Protected ETS tables — Partition.Holder uses
:protectedETS for lock-free reads - Spin-wait optimization —
:erlang.yield/0instead of:timer.sleep/1in Queue/Stack lock contention loops - Shared distributed helpers —
DistributedHelpersmodule eliminates code duplication across KeyValue, Queue, Stack, and Struct
Usage Guide in Plain IEx (Single Cluster, Two Nodes)
Start Cluster
Terminal 1:
iex --name node1@127.0.0.1 --cookie need_to_change_this -S mix
Terminal 2:
iex --name node2@127.0.0.1 --cookie need_to_change_this -S mix
Bootstrap and Test
# --- node1@127.0.0.1 ---
# Connect to peer
Node.connect(:"node2@127.0.0.1")
# Start cache (if :auto_start is missed or false)
SuperCache.Cluster.Bootstrap.start!(
key_pos: 0, partition_pos: 0,
cluster: :distributed,
replication_mode: :strong, # Try :async, :sync, or :strong
replication_factor: 2,
num_partition: 4
)
# Verify partition map is built
SuperCache.Cluster.Manager.live_nodes()
#=> [:"node1@127.0.0.1", :"node2@127.0.0.1"]
# Check cluster stats
SuperCache.cluster_stats()
# Write data
SuperCache.put!({:user, 1, "Alice"})
SuperCache.put!({:user, 2, "Bob"})
# Check record counts per partition
SuperCache.stats()# --- node2@127.0.0.1 ---
# Verify node1 is joined
Node.list()
#=> [:"node1@127.0.0.1"]
# Start cache (must match node1 config)
SuperCache.Cluster.Bootstrap.start!(
key_pos: 0, partition_pos: 0,
cluster: :distributed,
replication_mode: :strong,
replication_factor: 2,
num_partition: 4
)
# Read replicated data
SuperCache.get!({:user, 1})
#=> [{:user, 1, "Alice"}]
# Read with different consistency levels
SuperCache.get!({:user, 1}, read_mode: :local) # Fast, may be stale
SuperCache.get!({:user, 1}, read_mode: :primary) # Consistent with primary
SuperCache.get!({:user, 1}, read_mode: :quorum) # Majority agreementTesting
Run All Tests
# All tests (includes cluster tests)
mix test
# Unit tests only — no distribution needed
mix test --exclude cluster
# Specific test file
mix test test/kv_test.exs
# Specific test with line number
mix test test/kv_test.exs:42
# With warnings as errors
mix test --warnings-as-errors
# Limit failures
mix test --max-failures 3
Cluster Tests
# Run only cluster tests
mix test test/cluster/
# Run with specific seed for reproducibility
mix test test/cluster/ --seed 12345
# Run via alias (sets up proper VM flags)
mix test.cluster
Note: Cluster tests can be flaky due to timing and shared node config. Use
--exclude clusterfor fast local development.
Test Structure
| Directory | Tests | Description |
|---|---|---|
test/ | 9 files | Core single-node tests (ETS, KV, Queue, Stack, Struct, Partition, Storage) |
test/cluster/ | 9 files | Cluster integration tests (bootstrap, node failure, health monitor, 3PC, RYW) |
test/distributed/ | 6 files | Distributed API tests (batch writes, KV, main API, Queue, Stack, Struct) |
test/support/ | 1 file | ClusterCase - Shared ExUnit case template for cluster tests |
Testing WAL Consistency
# Start with strong mode
SuperCache.Cluster.Bootstrap.start!(
key_pos: 0, partition_pos: 0,
cluster: :distributed,
replication_mode: :strong,
replication_factor: 2,
num_partition: 4
)
# Write data
SuperCache.put!({:wal_test, 1, "data"})
# Check WAL stats
SuperCache.Cluster.WAL.stats()
#=> %{pending: 0, acks_pending: 0}
# Test recovery (simulate restart)
SuperCache.Cluster.WAL.recover()Testing Replication Modes
# Test async mode (fire-and-forget)
SuperCache.Cluster.Bootstrap.start!(
key_pos: 0, partition_pos: 0,
cluster: :distributed,
replication_mode: :async,
replication_factor: 2,
num_partition: 4
)
# Test sync mode (adaptive quorum)
SuperCache.Cluster.Bootstrap.start!(
key_pos: 0, partition_pos: 0,
cluster: :distributed,
replication_mode: :sync,
replication_factor: 2,
num_partition: 4
)Benchmark & Profiling
Built-in Benchmark Tools
Scripts are located in tools/:
# Performance benchmark
mix run tools/performance.exs
# Profiling with fprof
mix run tools/profile.exs
Quick Benchmarks in IEx
# Start cache
SuperCache.start!(key_pos: 0, partition_pos: 1, num_partition: 4)
# Benchmark put!
n = 100_000
{time_us, _} = :timer.tc(fn ->
Enum.each(1..n, fn i -> SuperCache.put!({:user, i, "data"}) end)
end)
ops_per_sec = n / (time_us / 1_000_000)
IO.puts("put! #{n} ops in #{time_us}µs = #{round(ops_per_sec)} ops/sec")
# Benchmark get!
{time_us, _} = :timer.tc(fn ->
Enum.each(1..n, fn i -> SuperCache.get!({:user, i}) end)
end)
ops_per_sec = n / (time_us / 1_000_000)
IO.puts("get! #{n} ops in #{time_us}µs = #{round(ops_per_sec)} ops/sec")
# Benchmark batch operations
alias SuperCache.KeyValue
{time_us, _} = :timer.tc(fn ->
KeyValue.add_batch("bench", Enum.map(1..10_000, fn i -> {i, "val"} end))
end)
ops_per_sec = 10_000 / (time_us / 1_000_000)
IO.puts("KeyValue.add_batch 10k in #{time_us}µs = #{round(ops_per_sec)} ops/sec")
SuperCache.stop()Expected Performance (Local Mode, 4 partitions)
| Operation | Throughput | Notes |
|---|---|---|
put! | ~1.2M ops/sec | ~33% overhead vs raw ETS |
get! | ~2.1M ops/sec | Near raw ETS speed |
KeyValue.add_batch (10k) | ~1.1M ops/sec | Single ETS insert |
Distributed Latency (Typical LAN)
| Operation | Async | Sync (Quorum) | Strong (WAL) |
|---|---|---|---|
| Write | ~50-100µs | ~100-300µs | ~200µs |
| Read (local) | ~10µs | ~10µs | ~10µs |
| Read (quorum) | ~100-200µs | ~100-200µs | ~100-200µs |
Profiling with :fprof
# Start profiling
:fprof.apply(SuperCache, :put!, [{:user, 1, "Alice"}])
:fprof.profile()
:fprof.analyse()Memory Profiling
# Check ETS table sizes
:ets.all()
|> Enum.map(fn table -> {table, :ets.info(table, :size)} end)
|> Enum.sort_by(fn {_, size} -> size end, :desc)
|> Enum.take(10)Debugging
Enable Debug Logging
Compile-time (zero overhead when disabled):
# config/config.exs
config :super_cache, debug_log: trueRuntime:
SuperCache.Log.enable(true)
SuperCache.Log.enable(false)Inspect Internal State
# Check partition configuration
SuperCache.Config.get_config(:num_partition)
SuperCache.Config.get_config(:key_pos)
SuperCache.Config.get_config(:partition_pos)
# Check cluster state
SuperCache.Cluster.Manager.live_nodes()
SuperCache.Cluster.Manager.get_replicas(0)
# Check WAL state
SuperCache.Cluster.WAL.stats()
# Check health metrics
SuperCache.Cluster.HealthMonitor.cluster_health()
SuperCache.Cluster.HealthMonitor.partition_balance()
# Check cache statistics
SuperCache.stats()
SuperCache.cluster_stats()
# Check API metrics
SuperCache.Cluster.Stats.api()
SuperCache.Cluster.Stats.partitions()Common Issues
"tuple size is lower than key_pos" — Ensure tuples have enough elements for the configured key_pos.
"Partition count mismatch" — All nodes must have the same num_partition value.
"Replication lag increasing" — Check network connectivity, verify no GC pauses, use HealthMonitor.cluster_health().
"Quorum reads timing out" — Ensure majority of nodes are reachable, check :erpc connectivity.
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style
- Run formatter:
mix format - Check for warnings:
mix compile --warnings-as-errors - Run tests:
mix test --exclude cluster
Adding New Features
- Add tests first (TDD approach)
- Implement the feature
- Update documentation in
README.mdand relevant guides - Update
@moduledocand@docstrings - Run full test suite:
mix test
Module Design Guidelines
- Public API functions — Use
@docwith examples. Provide both bang (!) and safe variants for operations that can fail. - erpc entry points — Use
@doc falsefor functions called across nodes via:erpc. These must be public but are not part of the user-facing API. - Local implementations — Prefix with
do_local_*to avoid collision with@doc falseerpc entry points. - Distributed routing — Use
Cluster.Routerfor routing,Cluster.Replicatorfor replication, andCluster.DistributedStorefor shared helpers. - Configuration — Store in
ConfigGenServer. Use:persistent_termfor hot-path keys. - ETS tables — Always use
EtsHolderfor lifecycle management. Tables are auto-deleted on shutdown.