# `UltraLogLog.Estimator.MLE`
[🔗](https://github.com/thatsme/ultra_log_log/blob/v0.1.0/lib/ultra_log_log/estimator/mle.ex#L1)

Maximum-likelihood cardinality estimator for UltraLogLog.

Asymptotic storage factor `8 · ln(2)/ζ(2, 5/4) ≈ 4.631`, relative
standard error `√(ln(2)/ζ(2, 5/4)) ≈ 0.761/√m`. Gives the full ~28%
space reduction vs HLL (compared to FGRA's ~24%) at the cost of one
iterative root-find per call.

## What is being maximized

The log-likelihood under the Poisson model (Ertl 2024 §3.1, line 474):

    ln L = -(n/m)·α + Σ_{u=1}^{w-1} β_u · ln(1 − e^(−n/(m·2^u)))

where `α` and `β_u` are closed-form weighted sums of register-value
counts `c_j` (paper lines 480–492). Setting `d/dn ln L = 0` gives the
ML equation, which has the same shape as the corresponding HLL one —
so the paper (line 503) reuses the secant solver from Ertl 2017
(arXiv:1702.01284, Algorithm 8).

After reparameterization that solver maximizes

    e^(−x·a) · ∏_{k=0}^{n} (1 − e^(−x/2^k))^{b[k]}

i.e. finds the root of

    g(x) := −Σb[k] + a·x + Σ_{k} b[k] · h(x/2^k)

with `h(x) := 1 − x/(e^x − 1)`. `h` is concave and monotonically
increasing on (0, ∞), so `g` is monotonically increasing and has a
unique root for any non-degenerate sketch.

## Algorithm shape (paper-faithful, mirroring Hash4j v0.17.0)

1. **Map registers → (a, b[])**: each register contributes to `a` and
   to up to three `b[k]` slots. The mapping (`contribute/3` below)
   ports `UltraLogLog.MaximumLikelihoodEstimator.contribute` and
   follows the paper's `α`, `β_u` definitions.

2. **Initial guess**: closed-form Jensen lower bound on the root
   (derivation in `DistinctCountUtil.java` lines 83–104). Provably
   `≤ root`, so the secant method converges monotonically from below.
   We deliberately avoid using `FGRA.estimate/1` as the initial guess
   even though the brief originally proposed it — Jensen's bound is
   coherent with the solver's state (no inter-estimator coupling) and
   monotonicity is a stronger guarantee.

3. **Secant iteration**: at each step, evaluate `g(x)` and use the
   secant ratio `(g − Σb) / (g_prev − g)` to update the step size.
   Convergence threshold scales with precision: stop when
   `|Δx|/x ≤ 0.001 · √v_ML / √m`. Hash4j-tight tolerance — output
   matches `MAXIMUM_LIKELIHOOD_ESTIMATOR` to ~1e-12 relative.

4. **Inner-loop `h(x)` evaluation**: avoid `:math.exp` and `:math.log`
   by computing `h(2·x')` for small `x' ∈ [0, 0.25]` via a degree-6
   Taylor polynomial (`x' − x'²/3 + x'⁴/45 − x'⁶/472.5`, the
   Bernoulli expansion of `1 − 2x'/(e^(2x') − 1)`) and walking up by
   the doubling recurrence
   `h(2z) = (z/2 + h(z)(1−h(z))) / (z/2 + (1−h(z)))` to the target
   `x/2^k`. No transcendental evaluations inside the loop.

5. **Bias correction**: divide the secant output by
   `1 + 0.481.../m` (paper eq. 11; the second-order delta-method
   correction).

## Verification

Spot-checked against `UltraLogLog.MAXIMUM_LIKELIHOOD_ESTIMATOR`
v0.17.0 on the same 16 register snapshots used for FGRA — see
`test/mle_test.exs` for the tolerance and statistical-correctness
battery, and `test/mle_test.exs` again for the convergence-speed
test (mean iterations < 10, max < 50 across 100 random sketches).

# `estimate`

```elixir
@spec estimate(UltraLogLog.t()) :: float()
```

Estimate cardinality from the current sketch state, per Ertl 2024
§3.1 (conference numbering; equivalent to arXiv §4.2).

# `estimate_with_iterations`

```elixir
@spec estimate_with_iterations(UltraLogLog.t()) :: {float(), non_neg_integer()}
```

Same as `estimate/1`, but also returns the secant iteration count.
Used by `test/mle_test.exs` to assert convergence speed; not intended
for production use.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
