SuperCache.Cluster.Stats (SuperCache v1.3.0)

Copy Markdown View Source

Observability utilities for a running SuperCache cluster.

Available reports

FunctionWhat it shows
cluster/0Full cluster overview — nodes, partitions, ETS sizes
partitions/0Per-partition primary/replica assignment + ETS count
primary_partitions/0Partitions owned (primary) by this node
replica_partitions/0Partitions this node replicates but does not own
node_partitions/1Partition ownership for any specific node
three_phase_commit/0Counters for the 3PC coordinator (this node)
api/0Per-operation call counters + latency percentiles
print/1Pretty-print any stats map to the console

Quick start

alias SuperCache.Cluster.Stats

Stats.cluster()        |> Stats.print()
Stats.partitions()     |> Stats.print()
Stats.three_phase_commit() |> Stats.print()
Stats.api()            |> Stats.print()

Telemetry integration

All counters are stored in a dedicated ETS table owned by SuperCache.Cluster.Metrics and are updated via SuperCache.Cluster.Stats.record/2. You can attach your own :telemetry handler or periodically call Stats.api/0 to ship metrics to an external system.

Summary

Functions

Return per-operation call counters and latency statistics for this node.

Return a full cluster overview map.

Return the partition ownership summary for target_node.

Return one map per partition describing ownership and local ETS size.

Return only the partitions for which this node is the primary.

Pretty-print any stats map or list returned by this module to stdout.

Record a completed API call.

Record a completed 3PC transaction.

Return only the partitions for which this node is a replica.

Return three-phase commit counters for this node.

Functions

api()

@spec api() :: map()

Return per-operation call counters and latency statistics for this node.

Each entry in the returned map is keyed by operation name and contains:

  • :calls — total number of calls
  • :errors — calls that returned {:error, _}
  • :avg_us — rolling average latency in microseconds
  • :p99_us — 99th-percentile latency (sampled window)

Operations tracked: :put, :get_local, :get_primary, :get_quorum, :delete, :delete_all, :delete_match, :replicate_async, :replicate_sync, :replicate_strong.

Example

iex> SuperCache.Cluster.Stats.api()
%{
  put:              %{calls: 5_000, errors: 2, avg_us: 210, p99_us: 890},
  get_local:        %{calls: 18_000, errors: 0, avg_us: 12,  p99_us: 45},
  get_primary:      %{calls: 300,   errors: 0, avg_us: 540, p99_us: 1_200},
  get_quorum:       %{calls: 50,    errors: 0, avg_us: 620, p99_us: 1_400},
  delete:           %{calls: 800,   errors: 0, avg_us: 190, p99_us: 750},
  delete_all:       %{calls: 3,     errors: 0, avg_us: 4_200, p99_us: 6_000},
  delete_match:     %{calls: 120,   errors: 0, avg_us: 310, p99_us: 900},
  replicate_async:  %{calls: 5_000, errors: 5, avg_us: 95,  p99_us: 400},
  replicate_sync:   %{calls: 0,     errors: 0, avg_us: 0,   p99_us: 0},
  replicate_strong: %{calls: 0,     errors: 0, avg_us: 0,   p99_us: 0}
}

cluster()

@spec cluster() :: map()

Return a full cluster overview map.

Fields:

  • :nodes — list of live nodes
  • :node_count — number of live nodes
  • :replication_factor — configured replication factor
  • :replication_mode:async | :sync | :strong

  • :num_partitions — total partition count
  • :partitions — list of per-partition maps (see partitions/0)
  • :total_records — sum of ETS record counts across all local partitions

Example

iex> SuperCache.Cluster.Stats.cluster()
%{
  nodes: [:"primary@127.0.0.1", :"replica@127.0.0.1"],
  node_count: 2,
  replication_factor: 2,
  replication_mode: :async,
  num_partitions: 8,
  total_records: 1_042,
  partitions: [...]
}

node_partitions(target_node)

@spec node_partitions(node()) :: map()

Return the partition ownership summary for target_node.

Returns a map with:

  • :node — the queried node
  • :primary_count — number of partitions it owns as primary
  • :replica_count — number of partitions it replicates
  • :primary_partitions — list of partition indices where it is primary
  • :replica_partitions — list of partition indices where it is replica

Example

iex> SuperCache.Cluster.Stats.node_partitions(:"replica@127.0.0.1")
%{
  node: :"replica@127.0.0.1",
  primary_count: 4,
  replica_count: 4,
  primary_partitions: [1, 3, 5, 7],
  replica_partitions: [0, 2, 4, 6]
}

partitions()

@spec partitions() :: [map()]

Return one map per partition describing ownership and local ETS size.

Each map contains:

  • :idx — partition index
  • :table — ETS table atom
  • :primary — node that owns this partition
  • :replicas — list of replica nodes
  • :role:primary | :replica | :none (this node's role)

  • :local_record_count — number of records in the local ETS table

Example

iex> SuperCache.Cluster.Stats.partitions()
[
  %{idx: 0, table: :"SuperCache.Storage.Ets_0", primary: :"a@host",
    replicas: [:"b@host"], role: :primary, local_record_count: 128},
  ...
]

primary_partitions()

@spec primary_partitions() :: [map()]

Return only the partitions for which this node is the primary.

Example

iex> SuperCache.Cluster.Stats.primary_partitions()
[
  %{idx: 0, primary: :"a@host", replicas: [:"b@host"],
    role: :primary, local_record_count: 128},
  ...
]

print(stats)

@spec print(map() | [map()]) :: :ok

Pretty-print any stats map or list returned by this module to stdout.

Handles the nested structures returned by cluster/0, partitions/0, three_phase_commit/0, and api/0.

Example

SuperCache.Cluster.Stats.cluster() |> SuperCache.Cluster.Stats.print()

# ╔══════════════════════════════════════╗
# ║       SuperCache Cluster Stats       ║
# ╚══════════════════════════════════════╝
# nodes            : [a@host, b@host]
# node_count       : 2
# replication_mode : async
# ...

record(key, map)

@spec record(term(), map()) :: :ok

Record a completed API call.

Called by SuperCache.Cluster.Router and SuperCache.Cluster.Replicator after each operation.

Stats.record({:api, :put}, %{latency_us: 210, error: false})

record_tpc(arg1, opts)

@spec record_tpc(
  atom(),
  keyword()
) :: :ok

Record a completed 3PC transaction.

Stats.record_tpc(:committed, latency_us: 1_840)
Stats.record_tpc(:aborted, phase: :prepare)

replica_partitions()

@spec replica_partitions() :: [map()]

Return only the partitions for which this node is a replica.

Example

iex> SuperCache.Cluster.Stats.replica_partitions()
[
  %{idx: 3, primary: :"b@host", replicas: [:"a@host"],
    role: :replica, local_record_count: 95},
  ...
]

three_phase_commit()

@spec three_phase_commit() :: map()

Return three-phase commit counters for this node.

Counters:

  • :in_flight — number of transactions currently in the TxnRegistry
  • :committed — total successfully committed transactions
  • :aborted — total aborted transactions (any phase)
  • :prepare_failures — transactions that failed at PREPARE phase
  • :pre_commit_failures — transactions that failed at PRE_COMMIT phase
  • :commit_failures — transactions that failed at COMMIT phase
  • :recovered_committed — in-doubt :pre_committed entries resolved on startup
  • :recovered_aborted — in-doubt :prepared entries rolled back on startup
  • :avg_commit_latency_us — rolling average commit latency in microseconds
  • :p99_commit_latency_us — 99th-percentile commit latency (sampled window)

Example

iex> SuperCache.Cluster.Stats.three_phase_commit()
%{
  in_flight: 0,
  committed: 1_200,
  aborted: 3,
  prepare_failures: 1,
  pre_commit_failures: 1,
  commit_failures: 1,
  recovered_committed: 0,
  recovered_aborted: 0,
  avg_commit_latency_us: 1_840,
  p99_commit_latency_us: 4_210
}