# `ExDataSketch.Theta`
[🔗](https://github.com/thanos/ex_data_sketch/blob/main/lib/ex_data_sketch/theta.ex#L1)

Theta Sketch for set operations on cardinalities.

Theta sketches support cardinality estimation with set operations (union,
intersection, difference) that other sketch families do not natively support.
This makes them ideal for queries like "how many users visited both page A
and page B?"

## How It Works

A Theta sketch maintains a set of hash values below a threshold (theta).
When the set exceeds the nominal size `k`, the threshold is lowered and
entries above it are discarded. The cardinality estimate is derived from
the number of retained entries and the current theta value.

## Options

- `:k` - nominal number of entries (default: 4096). Controls accuracy.
  Higher values use more memory but give better estimates.
  Must be a power of 2, between 16 and 67,108,864 (2^26).
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).

## Binary State Layout (v1)

All multi-byte fields are little-endian.

    Offset  Size    Field
    ------  ------  -----
    0       1       Version (u8, currently 1)
    1       4       k nominal entries (u32 little-endian)
    5       8       Theta value (u64 little-endian, max = 2^64-1 = "no threshold")
    13      4       Entry count (u32 little-endian)
    17      N*8     Entries (sorted array of u64 little-endian hash values)

Total: 17 + entry_count * 8 bytes.

## DataSketches Interop

Theta is the primary target for Apache DataSketches interop.
`serialize_datasketches/1` and `deserialize_datasketches/1` implement
the CompactSketch binary format, enabling cross-language compatibility
with Java, C++, and Python DataSketches libraries.

## Merge Properties

Theta merge (union) is **associative** and **commutative**.
This means sketches can be merged in any order or grouping and produce the
same result, making Theta safe for parallel and distributed aggregation.

# `t`

```elixir
@type t() :: %ExDataSketch.Theta{backend: module(), opts: keyword(), state: binary()}
```

# `compact`

```elixir
@spec compact(t()) :: t()
```

Compacts the sketch into a read-only form with sorted entries.

Compacting discards any entries above the current theta threshold and
sorts the remaining entries. This is required before serialization to
the DataSketches CompactSketch format.

## Examples

    iex> sketch = ExDataSketch.Theta.new() |> ExDataSketch.Theta.update("x") |> ExDataSketch.Theta.compact()
    iex> ExDataSketch.Theta.estimate(sketch) > 0.0
    true

# `deserialize`

```elixir
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an EXSK binary into a Theta sketch.

Returns `{:ok, sketch}` on success or `{:error, reason}` on failure.

## Examples

    iex> ExDataSketch.Theta.deserialize(<<"invalid">>)
    {:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

# `deserialize_datasketches`

```elixir
@spec deserialize_datasketches(
  binary(),
  keyword()
) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an Apache DataSketches CompactSketch binary into a Theta sketch.

## Options

- `:seed` - expected seed value for seed hash verification (default: 9001).

## Examples

    iex> sketch = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("test")
    iex> binary = ExDataSketch.Theta.serialize_datasketches(sketch)
    iex> {:ok, restored} = ExDataSketch.Theta.deserialize_datasketches(binary)
    iex> ExDataSketch.Theta.estimate(restored) == ExDataSketch.Theta.estimate(sketch)
    true

# `estimate`

```elixir
@spec estimate(t()) :: float()
```

Estimates the cardinality (distinct count) from the sketch.

## Examples

    iex> ExDataSketch.Theta.new() |> ExDataSketch.Theta.estimate()
    0.0

# `from_enumerable`

```elixir
@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()
```

Creates a new Theta sketch from an enumerable of items.

Equivalent to `new(opts) |> update_many(enumerable)`.

## Options

Same as `new/1`.

## Examples

    iex> sketch = ExDataSketch.Theta.from_enumerable(["a", "b", "c"], k: 1024)
    iex> ExDataSketch.Theta.estimate(sketch) > 0.0
    true

# `merge`

```elixir
@spec merge(t(), t()) :: t()
```

Merges two Theta sketches (set union).

Both sketches must have the same `k` value. Returns the merged sketch.
Raises `ExDataSketch.Errors.IncompatibleSketchesError` if the sketches
have different parameters.

## Examples

    iex> a = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("x")
    iex> b = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("y")
    iex> merged = ExDataSketch.Theta.merge(a, b)
    iex> ExDataSketch.Theta.estimate(merged) >= ExDataSketch.Theta.estimate(a)
    true

# `merge_many`

```elixir
@spec merge_many(Enumerable.t()) :: t()
```

Merges a non-empty enumerable of Theta sketches into one.

Raises `Enum.EmptyError` if the enumerable is empty.

## Examples

    iex> a = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("x")
    iex> b = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("y")
    iex> merged = ExDataSketch.Theta.merge_many([a, b])
    iex> ExDataSketch.Theta.estimate(merged) > 0.0
    true

# `merger`

```elixir
@spec merger(keyword()) :: (t(), t() -&gt; t())
```

Returns a 2-arity merge function suitable for combining sketches.

The returned function calls `merge/2` on two sketches.

## Examples

    iex> is_function(ExDataSketch.Theta.merger(), 2)
    true

# `new`

```elixir
@spec new(keyword()) :: t()
```

Creates a new Theta sketch.

## Options

- `:k` - nominal number of entries (default: 4096). Must be a
  power of 2, between 16 and 67108864.
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).
- `:hash_fn` - custom hash function `(term -> non_neg_integer)`.
- `:seed` - hash seed (default: 0).

## Examples

    iex> sketch = ExDataSketch.Theta.new(k: 1024)
    iex> sketch.opts[:k]
    1024
    iex> ExDataSketch.Theta.size_bytes(sketch)
    17

# `reducer`

```elixir
@spec reducer() :: (term(), t() -&gt; t())
```

Returns a 2-arity reducer function suitable for `Enum.reduce/3` and similar.

The returned function calls `update/2` on each item.

## Examples

    iex> is_function(ExDataSketch.Theta.reducer(), 2)
    true

# `serialize`

```elixir
@spec serialize(t()) :: binary()
```

Serializes the sketch to the ExDataSketch-native EXSK binary format.

The serialized binary includes magic bytes, version, sketch type,
parameters, and state. See `ExDataSketch.Codec` for format details.

## Examples

    iex> sketch = ExDataSketch.Theta.new(k: 1024)
    iex> binary = ExDataSketch.Theta.serialize(sketch)
    iex> <<"EXSK", _rest::binary>> = binary
    iex> byte_size(binary) > 0
    true

# `serialize_datasketches`

```elixir
@spec serialize_datasketches(
  t(),
  keyword()
) :: binary()
```

Serializes the sketch to Apache DataSketches CompactSketch format.

This is the primary interop target for cross-language compatibility.
The CompactSketch format uses 64-bit hashes with a seed hash for
compatibility verification.

## Options

- `:seed` - the seed value for seed hash computation (default: 9001).

## Examples

    iex> sketch = ExDataSketch.Theta.new(k: 1024) |> ExDataSketch.Theta.update("hello")
    iex> binary = ExDataSketch.Theta.serialize_datasketches(sketch)
    iex> is_binary(binary) and byte_size(binary) > 0
    true

# `size_bytes`

```elixir
@spec size_bytes(t()) :: non_neg_integer()
```

Returns the size of the sketch state in bytes.

## Examples

    iex> ExDataSketch.Theta.new() |> ExDataSketch.Theta.size_bytes()
    17

# `update`

```elixir
@spec update(t(), term()) :: t()
```

Updates the sketch with a single item.

The item is hashed using `ExDataSketch.Hash.hash64/1` before being
inserted into the sketch.

## Examples

    iex> sketch = ExDataSketch.Theta.new() |> ExDataSketch.Theta.update("hello")
    iex> ExDataSketch.Theta.estimate(sketch) > 0.0
    true

# `update_many`

```elixir
@spec update_many(t(), Enumerable.t()) :: t()
```

Updates the sketch with multiple items in a single pass.

More efficient than calling `update/2` repeatedly because it minimizes
intermediate binary allocations.

## Examples

    iex> sketch = ExDataSketch.Theta.new() |> ExDataSketch.Theta.update_many(["a", "b", "c"])
    iex> ExDataSketch.Theta.estimate(sketch) > 0.0
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
