UltraLogLog.Estimator.Martingale (UltraLogLog v0.1.0)

Copy Markdown View Source

Martingale (Historic Inverse Probability) estimator for UltraLogLog.

Asymptotic storage factor 5·ln(2) ≈ 3.466 (Ertl 2024 §3.7 line 845, via paper eq. 26 with b=2, d=2, q=6 → MVP ≈ 3.4657), relative standard error ≈ 0.658/√m (= √(MVP/8m)). This is the lowest variance estimator ULL admits — but at a critical cost:

It is only valid for sketches that have never been merged.

The estimate is maintained incrementally on every successful insert (Algorithm 2 in the paper). Merging two sketches loses the per-element update history needed to keep the martingale unbiased, so any UltraLogLog.merge/2 call invalidates the running martingale state. After invalidation, UltraLogLog.cardinality/2 with estimator: :martingale returns {:error, :invalidated_by_merge}.

Algorithm 2 (paper p. 1664)

n̂_martingale  n̂_martingale + 1/μ         update estimate
μ  μ  h(r) + h(r')                        state-change prob

with μ := Σᵢ h(rᵢ) initially equal to 1 (every register is at the smallest possible state, h(0) = 1/m, summed over m registers). The increment of 1/μ runs before the μ update — the next insert sees the post-update μ.

h(r) — per-register state-change probability (paper eq. 25)

Special cases on the four smallest paper-r values:

h(0)  = 1/m
h(4)  = 1/(2m)
h(8)  = 3/(4m)
h(10) = 1/(4m)

Intermediate range for r = 4u + ⟨l₁l₂⟩₂, 3 ≤ u < w:

h(r) = (7  2·l  4·l) / (2^u · m)

Saturated range for r = 4w + ⟨l₁l₂⟩₂:

h(r) = (3  l  2·l) / (2^(w1) · m)

These four cases collapse into one branch-free integer expression (h_scaled/2 below) that mirrors Hash4j v0.17.0's UltraLogLog.getScaledRegisterChangeProbability and exploits integer-truncation >>> p as a free divide-by-2 for the saturated range. Algebraically verified against the paper for every special case in the moduledoc tests.

State shape

The martingale field of %UltraLogLog{} carries a tuple {estimate :: float(), mu :: float()} for active sketches and nil for merge-invalidated ones. Initial value at UltraLogLog.new/1 is {0.0, 1.0} — the empty sketch has cardinality 0 and every register is fully changeable.

Summary

Types

Active martingale state: {running_estimate, current_μ}.

Functions

Update the running estimate after a single register transition.

Return the current martingale estimate.

Types

state()

@type state() :: {float(), float()}

Active martingale state: {running_estimate, current_μ}.

Functions

delta(arg, old_reg, new_reg, p)

@spec delta(state(), 0..255, 0..255, UltraLogLog.precision()) :: state()

Update the running estimate after a single register transition.

Called from UltraLogLog.add/2 only when an insert actually changes a register byte (no-op inserts do not call this — see paper Algorithm 2's r < r' precondition).

Returns the new {estimate, μ} pair.

estimate(ultra_log_log)

@spec estimate(UltraLogLog.t()) :: {:ok, float()} | {:error, :invalidated_by_merge}

Return the current martingale estimate.

Returns {:ok, value} for active (un-merged) sketches and {:error, :invalidated_by_merge} for sketches whose martingale state was nullified by UltraLogLog.merge/2.