UltraLogLog (UltraLogLog v0.1.0)

Copy Markdown View Source

Space-efficient approximate distinct counting on the BEAM.

Implementation of:

Otmar Ertl. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. PVLDB 17(7), 2024. https://www.vldb.org/pvldb/vol17/p1655-ertl.pdf

UltraLogLog (ULL) is a 2024 successor to HyperLogLog. It keeps every practical property HLL is famous for — constant memory, constant-time inserts, commutative/idempotent/associative merge — and adds 24–28% less memory at the same accuracy, depending on which estimator you pick. The trade is one extra register bit per slot (8-bit ULL registers vs 6-bit HLL ones), recovered many times over by a tighter information density per register.

Quick start

iex> ull = UltraLogLog.new(precision: 12)
iex> ull = UltraLogLog.add(ull, "session-abc")
iex> ull = UltraLogLog.add(ull, "session-xyz")
iex> {:ok, count} = UltraLogLog.cardinality(ull)
iex> abs(count - 2.0) < 0.1
true

Merging two sketches is value-level and associative — no coordinator, no quorum, no consensus:

iex> a = UltraLogLog.new(precision: 12) |> UltraLogLog.add("x")
iex> b = UltraLogLog.new(precision: 12) |> UltraLogLog.add("y")
iex> {:ok, count} = UltraLogLog.cardinality(UltraLogLog.merge(a, b))
iex> abs(count - 2.0) < 0.1
true

Estimators

Pick one of three estimators via the :estimator option to cardinality/2. All three are paper-faithful translations of the algorithms in Ertl 2024, cross-validated against the Hash4j Java reference (v0.17.0) to floating-point precision.

EstimatorStorage factorRel. std. errorUse when
:fgra (default)4.8950.782/√mDefault. Single-pass, no iteration.
:mle4.6310.761/√mTightest bound; secant solver runs ~5 iterations per query.
:martingale3.4660.658/√mSingle-stream sketch never merged; returns {:error, :invalidated_by_merge} after merge/2.

The martingale stderr derives from the memory-variance product MVP ≈ 5·ln(2) ≈ 3.466 in the paper (§3.7); the 0.658 figure is √(MVP / 8).

iex> ull = UltraLogLog.new(precision: 12) |> UltraLogLog.add("k")
iex> {:ok, _} = UltraLogLog.cardinality(ull, estimator: :mle)

See UltraLogLog.Estimator.FGRA, UltraLogLog.Estimator.MLE, and UltraLogLog.Estimator.Martingale for the per-estimator details.

Precision → memory → error

Precision p allocates 2^p 8-bit registers — state size is exactly 2^p bytes:

p=10   1 KB,  ~3.1% relative error
p=12   4 KB,  ~1.2% relative error  (default)
p=14  16 KB,  ~0.6% relative error
p=16  64 KB,  ~0.3% relative error

Pick the smallest p whose error fits your use case. p=12 is a sensible default; bump to 14 if you need sub-percent accuracy.

Mergeability and CRDTs

merge/2 is element-wise on registers under the UltraLogLog partial order. The operation is commutative, associative, and idempotent — i.e. ULL sketches form a CRDT under merge. This is what makes distributed cardinality estimation trivial: shards independently maintain their own sketches, you merge on demand, and the answer is exactly as if every insert had hit a single sketch.

Merging invalidates the martingale estimator (its incremental update history is lost). FGRA and MLE remain valid on merged sketches.

Serialization

to_binary/1 and from_binary/1 round-trip the sketch as a versioned compact binary; cardinality is preserved exactly:

iex> ull = UltraLogLog.new(precision: 12)
iex> ull = Enum.reduce(1..100, ull, &UltraLogLog.add(&2, &1))
iex> {:ok, before} = UltraLogLog.cardinality(ull)
iex> {:ok, restored} = UltraLogLog.from_binary(UltraLogLog.to_binary(ull))
iex> {:ok, after_} = UltraLogLog.cardinality(restored)
iex> before == after_
true

Deserialization invalidates the martingale field — its incremental history isn't part of the wire format. A cardinality(restored, estimator: :martingale) call on a reloaded sketch therefore returns {:error, :invalidated_by_merge}. FGRA and MLE round-trip cleanly.

Hashing

add/2 accepts any term (hashed internally via UltraLogLog.Hash) or a pre-computed non-negative integer treated as a 64-bit hash. The v0.1 internal hash is a :erlang.phash2/2 derivation suitable for well-distributed inputs; production deployments with adversarial or skewed keyspaces should pass pre-computed hashes from a quality 64-bit function (xxhash3, wyhash). A native xxhash3 NIF is planned for v0.2.

Status

  • v0.1 ships the immutable sketch, all three estimators, merge, serialize, and downsize (full implementation in v0.2). Bit-exact validated against Hash4j v0.17.0.
  • v0.2 will add a lock-free :atomics-backed insert path, a native hash function, and benchmarks.
  • v0.3 will add PartitionSupervisor-sharded cluster-wide merge.

See CHANGELOG.md for the full release history and README.md for the empirical-validation summary.

Summary

Functions

Insert a value into the sketch.

Estimate the number of distinct elements added.

Reduce the precision of a sketch losslessly (in the ULL sense — error grows but no information is fabricated).

Deserialize a sketch from its binary form.

Merge a list of sketches.

Merge two sketches. Both must have the same precision.

Create a new empty UltraLogLog sketch.

Serialize the sketch to a compact binary form.

Types

estimator()

@type estimator() :: :fgra | :mle | :martingale

precision()

@type precision() :: 3..26

t()

@type t() :: %UltraLogLog{
  m: pos_integer(),
  martingale: nil | {float(), float()},
  precision: precision(),
  registers: binary()
}

Functions

add(ull, hash)

@spec add(t(), term() | non_neg_integer()) :: t()

Insert a value into the sketch.

Accepts any term (will be hashed) or a pre-computed 64-bit unsigned hash.

cardinality(ull, opts \\ [])

@spec cardinality(
  t(),
  keyword()
) :: {:ok, float()} | {:error, term()}

Estimate the number of distinct elements added.

downsize(ull, target_p)

@spec downsize(t(), precision()) :: t()

Reduce the precision of a sketch losslessly (in the ULL sense — error grows but no information is fabricated).

from_binary(arg1)

@spec from_binary(binary()) :: {:ok, t()} | {:error, term()}

Deserialize a sketch from its binary form.

merge(list)

@spec merge([t(), ...]) :: t()

Merge a list of sketches.

merge(a, b)

@spec merge(t(), t()) :: t()

Merge two sketches. Both must have the same precision.

Merge is element-wise on registers under the UltraLogLog partial order (see UltraLogLog.Encoding.merge_registers/2). Result is commutative, associative, and idempotent — i.e. a CRDT.

Note: merging invalidates the martingale estimator on the result.

new(opts \\ [])

@spec new(keyword()) :: t()

Create a new empty UltraLogLog sketch.

Options

  • :precision3..26, default 12. State size is 2^precision bytes.

to_binary(ultra_log_log)

@spec to_binary(t()) :: binary()

Serialize the sketch to a compact binary form.

Format (v1):

<<"ULL1", precision::8, registers::binary>>