# `ExDataSketch.CMS`
[🔗](https://github.com/thanos/ex_data_sketch/blob/main/lib/ex_data_sketch/cms.ex#L1)

Count-Min Sketch (CMS) for frequency estimation.

CMS provides approximate frequency estimates for items in a data stream.
It answers: "approximately how many times has this item appeared?"

The sketch uses `depth` independent hash functions and a `width x depth`
counter matrix. Estimates are guaranteed to never undercount (for non-negative
increments) but may overcount.

## Memory and Accuracy

- Counter matrix: `width * depth * (counter_width / 8)` bytes.
- Error bound: estimates are within `e * N / width` of the true count
  with probability at least `1 - (1/2)^depth`, where `N` is total count
  and `e` is Euler's number.

| Width | Depth | Counter | Memory     |
|-------|-------|---------|------------|
| 2,048 | 5     | 32-bit  | 40 KiB    |
| 8,192 | 7     | 32-bit  | 224 KiB   |
| 2,048 | 5     | 64-bit  | 80 KiB    |

## Binary State Layout (v1)

All multi-byte fields are little-endian.

    Offset  Size      Field
    ------  --------  -----
    0       1         Version (u8, currently 1)
    1       4         Width (u32 little-endian)
    5       2         Depth (u16 little-endian)
    7       1         Counter width in bits (u8, 32 or 64)
    8       1         Reserved (u8, must be 0)
    9       W*D*C     Counters (row-major, little-endian per counter)

Where C = counter_width / 8 (4 or 8 bytes per counter).
Total: 9 + width * depth * C bytes.

## Options

- `:width` - number of counters per row, pos_integer (default: 2048).
- `:depth` - number of rows (hash functions), pos_integer (default: 5).
- `:counter_width` - bits per counter, 32 or 64 (default: 32).
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).

## Overflow Policy

Default: saturating. When a counter reaches its maximum value (2^32-1 or
2^64-1), further increments leave it at the maximum. This prevents wrap-around
errors at the cost of potential undercounting at extreme values.

## Merge Properties

CMS merge is **associative** and **commutative** (element-wise counter addition).
This means sketches can be merged in any order or grouping and produce the
same result, making CMS safe for parallel and distributed aggregation.

# `t`

```elixir
@type t() :: %ExDataSketch.CMS{backend: module(), opts: keyword(), state: binary()}
```

# `deserialize`

```elixir
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an EXSK binary into a CMS sketch.

Returns `{:ok, sketch}` on success or `{:error, reason}` on failure.

## Examples

    iex> ExDataSketch.CMS.deserialize(<<"invalid">>)
    {:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

# `deserialize_datasketches`

```elixir
@spec deserialize_datasketches(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an Apache DataSketches CMS binary.

Not implemented. See `serialize_datasketches/1` for details.

## Examples

    iex> try do
    ...>   ExDataSketch.CMS.deserialize_datasketches(<<>>)
    ...> rescue
    ...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
    ...> end
    "ExDataSketch.CMS.deserialize_datasketches is not yet implemented"

# `estimate`

```elixir
@spec estimate(t(), term()) :: non_neg_integer()
```

Estimates the frequency of an item in the sketch.

Returns a non-negative integer. The estimate is guaranteed to be at least
the true count (no undercounting) but may overcount.

## Examples

    iex> sketch = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("hello", 5)
    iex> ExDataSketch.CMS.estimate(sketch, "hello")
    5

# `from_enumerable`

```elixir
@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()
```

Creates a new CMS sketch from an enumerable of items.

Equivalent to `new(opts) |> update_many(enumerable)`.

## Options

Same as `new/1`.

## Examples

    iex> sketch = ExDataSketch.CMS.from_enumerable(["a", "b", "a"])
    iex> ExDataSketch.CMS.estimate(sketch, "a")
    2

# `merge`

```elixir
@spec merge(t(), t()) :: t()
```

Merges two CMS sketches.

Both sketches must have the same width, depth, and counter_width.
Counters are added element-wise with saturating arithmetic.

## Examples

    iex> a = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("x", 3)
    iex> b = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("x", 5)
    iex> merged = ExDataSketch.CMS.merge(a, b)
    iex> ExDataSketch.CMS.estimate(merged, "x")
    8

# `merge_many`

```elixir
@spec merge_many(Enumerable.t()) :: t()
```

Merges a non-empty enumerable of CMS sketches into one.

Raises `Enum.EmptyError` if the enumerable is empty.

## Examples

    iex> a = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("x")
    iex> b = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("x")
    iex> merged = ExDataSketch.CMS.merge_many([a, b])
    iex> ExDataSketch.CMS.estimate(merged, "x")
    2

# `merger`

```elixir
@spec merger(keyword()) :: (t(), t() -&gt; t())
```

Returns a 2-arity merge function suitable for combining sketches.

The returned function calls `merge/2` on two sketches.

## Examples

    iex> is_function(ExDataSketch.CMS.merger(), 2)
    true

# `new`

```elixir
@spec new(keyword()) :: t()
```

Creates a new CMS sketch.

## Options

- `:width` - counters per row (default: 2048). Must be positive.
- `:depth` - number of rows (default: 5). Must be positive.
- `:counter_width` - bits per counter, 32 or 64 (default: 32).
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).
- `:hash_fn` - custom hash function `(term -> non_neg_integer)`.
- `:seed` - hash seed (default: 0).

## Examples

    iex> sketch = ExDataSketch.CMS.new()
    iex> ExDataSketch.CMS.estimate(sketch, "anything")
    0

# `reducer`

```elixir
@spec reducer() :: (term(), t() -&gt; t())
```

Returns a 2-arity reducer function suitable for `Enum.reduce/3` and similar.

The returned function calls `update/2` on each item.

## Examples

    iex> is_function(ExDataSketch.CMS.reducer(), 2)
    true

# `serialize`

```elixir
@spec serialize(t()) :: binary()
```

Serializes the sketch to the ExDataSketch-native EXSK binary format.

## Examples

    iex> sketch = ExDataSketch.CMS.new(width: 100, depth: 3, counter_width: 32)
    iex> binary = ExDataSketch.CMS.serialize(sketch)
    iex> <<"EXSK", _rest::binary>> = binary
    iex> byte_size(binary) > 0
    true

# `serialize_datasketches`

```elixir
@spec serialize_datasketches(t()) :: binary()
```

Serializes the sketch to Apache DataSketches CMS format.

Not implemented. Apache DataSketches does not define a standard CMS binary
format. Only Theta sketches support DataSketches interop via
`ExDataSketch.Theta.serialize_datasketches/1`. For CMS serialization,
use `serialize/1` (ExDataSketch-native EXSK format).

## Examples

    iex> try do
    ...>   sketch = %ExDataSketch.CMS{state: <<>>, opts: [], backend: nil}
    ...>   ExDataSketch.CMS.serialize_datasketches(sketch)
    ...> rescue
    ...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
    ...> end
    "ExDataSketch.CMS.serialize_datasketches is not yet implemented"

# `size_bytes`

```elixir
@spec size_bytes(t()) :: non_neg_integer()
```

Returns the size of the sketch state in bytes.

## Examples

    iex> sketch = ExDataSketch.CMS.new(width: 100, depth: 3, counter_width: 32)
    iex> ExDataSketch.CMS.size_bytes(sketch)
    1209

# `update`

```elixir
@spec update(t(), term(), pos_integer()) :: t()
```

Updates the sketch with a single item.

The item is hashed using `ExDataSketch.Hash.hash64/1` before being
recorded in the sketch.

## Parameters

- `sketch` - the CMS sketch to update.
- `item` - any Elixir term to count.
- `increment` - positive integer to add (default: 1).

## Examples

    iex> sketch = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update("hello")
    iex> ExDataSketch.CMS.estimate(sketch, "hello")
    1

# `update_many`

```elixir
@spec update_many(t(), Enumerable.t()) :: t()
```

Updates the sketch with multiple items in a single pass.

Accepts an enumerable of items (each with implicit increment of 1) or
an enumerable of `{item, increment}` tuples.

## Examples

    iex> sketch = ExDataSketch.CMS.new() |> ExDataSketch.CMS.update_many(["a", "b", "a"])
    iex> ExDataSketch.CMS.estimate(sketch, "a")
    2

---

*Consult [api-reference.md](api-reference.md) for complete listing*
