# `ExDataSketch.HLL`
[🔗](https://github.com/thanos/ex_data_sketch/blob/main/lib/ex_data_sketch/hll.ex#L1)

HyperLogLog (HLL) sketch for cardinality estimation.

HLL provides approximate distinct-count estimates using sublinear memory.
The precision parameter `p` controls the trade-off between memory usage and
accuracy: higher `p` means more memory but better estimates.

## Memory and Accuracy

- Register count: `m = 2^p`
- Memory: `m` bytes (one byte per register in v1 format)
- Relative standard error: approximately `1.04 / sqrt(m)`

| p  | Registers | Memory  | ~Error |
|----|-----------|---------|--------|
| 10 | 1,024     | 1 KiB  | 3.25%  |
| 12 | 4,096     | 4 KiB  | 1.63%  |
| 14 | 16,384    | 16 KiB | 0.81%  |
| 16 | 65,536    | 64 KiB | 0.41%  |

## Binary State Layout (v1)

All multi-byte fields are little-endian.

    Offset  Size    Field
    ------  ------  -----
    0       1       Version (u8, currently 1)
    1       1       Precision p (u8, 4..16)
    2       2       Reserved flags (u16 little-endian, must be 0)
    4       m       Registers (m = 2^p bytes, one u8 per register)

Total: 4 + 2^p bytes.

## Options

- `:p` - precision parameter, integer 4..16 (default: 14)
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`)

## Merge Properties

HLL merge is **associative** and **commutative** (register-wise max).
This means sketches can be merged in any order or grouping and produce the
same result, making HLL safe for parallel and distributed aggregation.

# `t`

```elixir
@type t() :: %ExDataSketch.HLL{backend: module(), opts: keyword(), state: binary()}
```

# `deserialize`

```elixir
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an EXSK binary into an HLL sketch.

Returns `{:ok, sketch}` on success or `{:error, reason}` on failure.

## Examples

    iex> ExDataSketch.HLL.deserialize(<<"invalid">>)
    {:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

# `deserialize_datasketches`

```elixir
@spec deserialize_datasketches(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an Apache DataSketches HLL binary.

Not implemented. See `serialize_datasketches/1` for details.

## Examples

    iex> try do
    ...>   ExDataSketch.HLL.deserialize_datasketches(<<>>)
    ...> rescue
    ...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
    ...> end
    "ExDataSketch.HLL.deserialize_datasketches is not yet implemented"

# `estimate`

```elixir
@spec estimate(t()) :: float()
```

Estimates the number of distinct items in the sketch.

Returns a floating-point estimate. The accuracy depends on the precision
parameter `p`.

## Examples

    iex> ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.estimate()
    0.0

# `from_enumerable`

```elixir
@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()
```

Creates a new HLL sketch from an enumerable of items.

Equivalent to `new(opts) |> update_many(enumerable)`.

## Options

Same as `new/1`.

## Examples

    iex> sketch = ExDataSketch.HLL.from_enumerable(["a", "b", "c"], p: 10)
    iex> ExDataSketch.HLL.estimate(sketch) > 0.0
    true

# `merge`

```elixir
@spec merge(t(), t()) :: t()
```

Merges two HLL sketches.

Both sketches must have the same precision `p`. The result contains the
register-wise maximum, which corresponds to the union of the two input
multisets.

Returns the merged sketch. Raises `ExDataSketch.Errors.IncompatibleSketchesError`
if the sketches have different parameters.

## Examples

    iex> a = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update("x")
    iex> b = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update("y")
    iex> merged = ExDataSketch.HLL.merge(a, b)
    iex> ExDataSketch.HLL.estimate(merged) >= ExDataSketch.HLL.estimate(a)
    true

# `merge_many`

```elixir
@spec merge_many(Enumerable.t()) :: t()
```

Merges a non-empty enumerable of HLL sketches into one.

Raises `Enum.EmptyError` if the enumerable is empty.

## Examples

    iex> a = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update("x")
    iex> b = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update("y")
    iex> merged = ExDataSketch.HLL.merge_many([a, b])
    iex> ExDataSketch.HLL.estimate(merged) > 0.0
    true

# `merger`

```elixir
@spec merger(keyword()) :: (t(), t() -&gt; t())
```

Returns a 2-arity merge function suitable for combining sketches.

The returned function calls `merge/2` on two sketches.

## Examples

    iex> is_function(ExDataSketch.HLL.merger(), 2)
    true

# `new`

```elixir
@spec new(keyword()) :: t()
```

Creates a new HLL sketch.

## Options

- `:p` - precision parameter, integer 4..16 (default: 14).
  Higher values use more memory but give better accuracy.
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).
- `:hash_fn` - custom hash function `(term -> non_neg_integer)`.
- `:seed` - hash seed (default: 0).

## Examples

    iex> sketch = ExDataSketch.HLL.new(p: 10)
    iex> sketch.opts[:p]
    10
    iex> ExDataSketch.HLL.size_bytes(sketch)
    1028

# `reducer`

```elixir
@spec reducer() :: (term(), t() -&gt; t())
```

Returns a 2-arity reducer function suitable for `Enum.reduce/3` and similar.

The returned function calls `update/2` on each item.

## Examples

    iex> is_function(ExDataSketch.HLL.reducer(), 2)
    true

# `serialize`

```elixir
@spec serialize(t()) :: binary()
```

Serializes the sketch to the ExDataSketch-native EXSK binary format.

The serialized binary includes magic bytes, version, sketch type,
parameters, and state. See `ExDataSketch.Codec` for format details.

## Examples

    iex> sketch = ExDataSketch.HLL.new(p: 10)
    iex> binary = ExDataSketch.HLL.serialize(sketch)
    iex> <<"EXSK", _rest::binary>> = binary
    iex> byte_size(binary) > 0
    true

# `serialize_datasketches`

```elixir
@spec serialize_datasketches(t()) :: binary()
```

Serializes the sketch to Apache DataSketches HLL format.

Not implemented. Apache DataSketches HLL interop is not planned for the
current release series. Only Theta sketches support DataSketches interop
via `ExDataSketch.Theta.serialize_datasketches/1`. For HLL serialization,
use `serialize/1` (ExDataSketch-native EXSK format).

## Examples

    iex> try do
    ...>   sketch = %ExDataSketch.HLL{state: <<>>, opts: [p: 14], backend: nil}
    ...>   ExDataSketch.HLL.serialize_datasketches(sketch)
    ...> rescue
    ...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
    ...> end
    "ExDataSketch.HLL.serialize_datasketches is not yet implemented"

# `size_bytes`

```elixir
@spec size_bytes(t()) :: non_neg_integer()
```

Returns the size of the sketch state in bytes.

## Examples

    iex> ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.size_bytes()
    1028

# `update`

```elixir
@spec update(t(), term()) :: t()
```

Updates the sketch with a single item.

The item is hashed using `ExDataSketch.Hash.hash64/1` before being
inserted into the sketch.

## Examples

    iex> sketch = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update("hello")
    iex> ExDataSketch.HLL.estimate(sketch) > 0.0
    true

# `update_many`

```elixir
@spec update_many(t(), Enumerable.t()) :: t()
```

Updates the sketch with multiple items in a single pass.

More efficient than calling `update/2` repeatedly because it minimizes
intermediate binary allocations.

## Examples

    iex> sketch = ExDataSketch.HLL.new(p: 10) |> ExDataSketch.HLL.update_many(["a", "b", "c"])
    iex> ExDataSketch.HLL.estimate(sketch) > 0.0
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
