# `ExDataSketch.FrequentItems`
[🔗](https://github.com/thanos/ex_data_sketch/blob/main/lib/ex_data_sketch/frequent_items.ex#L1)

FrequentItems sketch for approximate heavy-hitter detection using the
SpaceSaving algorithm.

Tracks the top-k most frequent items in a data stream using at most `k`
counters. Each counter stores an item, its estimated count, and a maximum
overcount error. The sketch provides approximate frequency estimates with
bounded memory and deterministic tie-breaking.

## Algorithm (SpaceSaving)

The SpaceSaving algorithm maintains a fixed set of at most `k` counters:

- **Update(x)**: If x is already tracked, increment its count. If there is
  room (fewer than k entries), insert x with count=1 and error=0. Otherwise,
  evict the entry with the minimum count (ties broken by lexicographically
  smallest `item_bytes`), and replace it with x, count=min_count+1,
  error=min_count.

- **Weighted update(x, w)**: Same logic but increment by w. Replacement
  count = min_count + w, error = min_count.

- **Batch optimization**: Pre-aggregate incoming items into a frequency map,
  then apply weighted updates for each unique item in sorted key order.
  Single decode + single encode of the binary state.

## Key Encoding

Items are encoded to binary at the public API boundary using a configurable
key encoding policy:

| Encoding | Description |
|----------|-------------|
| `:binary` (default) | Keys are raw binaries, passed through as-is |
| `:int` | Keys are integers, encoded as signed 64-bit little-endian |
| `{:term, :external}` | Keys are arbitrary Erlang terms, encoded via `:erlang.term_to_binary/1` |

The backend always receives and returns raw `item_bytes`. Decoding happens
at the public API boundary when returning results.

## Merge Properties

FrequentItems merge is **commutative**. Both sketches must have the same
`k` and `key_encoding` parameters. Merge combines counts and errors
additively across the union of keys, then retains the top-k entries by
count (ties broken by lexicographically smallest key) to enforce the
capacity invariant. Count (`n`) is always exactly additive regardless of
whether entries are dropped.

Associativity holds for count totals but not necessarily for the retained
entry set, since intermediate merges may drop entries that a different
grouping would retain. See `docs/frequent_items_format.md` for details.

## Binary State Format (FI1)

See `docs/frequent_items_format.md` for the complete binary layout
specification, canonicalization rules, and merge algebra proof sketch.

## Options

- `:k` - maximum number of counters (default: 10, must be >= 1).
- `:key_encoding` - key encoding policy: `:binary` (default), `:int`,
  or `{:term, :external}`.
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).

# `t`

```elixir
@type t() :: %ExDataSketch.FrequentItems{
  backend: module(),
  opts: keyword(),
  state: binary()
}
```

# `count`

```elixir
@spec count(t()) :: non_neg_integer()
```

Returns the total number of items observed by the sketch.

This is the sum of all weights (each `update/2` call contributes 1).

## Examples

    iex> ExDataSketch.FrequentItems.new(k: 5) |> ExDataSketch.FrequentItems.count()
    0

# `deserialize`

```elixir
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an EXSK binary into a FrequentItems sketch.

Returns `{:ok, sketch}` on success or `{:error, reason}` on failure.

## Examples

    iex> ExDataSketch.FrequentItems.deserialize(<<"invalid">>)
    {:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

# `entry_count`

```elixir
@spec entry_count(t()) :: non_neg_integer()
```

Returns the number of distinct items currently tracked by the sketch.

This is always <= k.

## Examples

    iex> ExDataSketch.FrequentItems.new(k: 5) |> ExDataSketch.FrequentItems.entry_count()
    0

# `estimate`

```elixir
@spec estimate(t(), term()) :: {:ok, map()} | {:error, :not_tracked}
```

Returns the frequency estimate for a given item.

Returns `{:ok, estimate_map}` if the item is tracked, where `estimate_map`
contains `:estimate`, `:error`, `:lower`, and `:upper` fields.
Returns `{:error, :not_tracked}` if the item is not in the sketch.

The estimate map fields:
- `:estimate` - the estimated count (may overcount but never undercount)
- `:error` - the maximum possible overcount
- `:lower` - `max(estimate - error, 0)`, guaranteed lower bound
- `:upper` - same as estimate, guaranteed upper bound

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 10) |> ExDataSketch.FrequentItems.update("x")
    iex> {:ok, est} = ExDataSketch.FrequentItems.estimate(sketch, "x")
    iex> est.estimate
    1

# `frequent`

```elixir
@spec frequent(t(), non_neg_integer()) :: [map()]
```

Returns items whose estimated frequency exceeds the given threshold.

The threshold is compared against the lower bound (estimate - error).
Only items with `lower >= threshold` are returned.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 10)
    iex> sketch = ExDataSketch.FrequentItems.update_many(sketch, List.duplicate("a", 100) ++ ["b"])
    iex> frequent = ExDataSketch.FrequentItems.frequent(sketch, 50)
    iex> Enum.any?(frequent, fn e -> e.item == "a" end)
    true

# `from_enumerable`

```elixir
@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()
```

Creates a new FrequentItems sketch from an enumerable of items.

Equivalent to `new(opts) |> update_many(enumerable)`.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.from_enumerable(["a", "b", "a"], k: 10)
    iex> ExDataSketch.FrequentItems.count(sketch)
    3

# `merge`

```elixir
@spec merge(t(), t()) :: t()
```

Merges two FrequentItems sketches.

Both sketches must have the same `k` and `key_encoding` parameters. The
merge combines counts and errors additively across the union of tracked
items, then retains the top-k entries by count to enforce the capacity
invariant.

Raises `ExDataSketch.Errors.IncompatibleSketchesError` if the sketches
have different `k` or `key_encoding` values.

## Examples

    iex> a = ExDataSketch.FrequentItems.new(k: 5) |> ExDataSketch.FrequentItems.update_many(["x", "x"])
    iex> b = ExDataSketch.FrequentItems.new(k: 5) |> ExDataSketch.FrequentItems.update_many(["y", "y"])
    iex> merged = ExDataSketch.FrequentItems.merge(a, b)
    iex> ExDataSketch.FrequentItems.count(merged)
    4

# `merge_many`

```elixir
@spec merge_many(Enumerable.t()) :: t()
```

Merges a non-empty enumerable of FrequentItems sketches into one.

Raises `Enum.EmptyError` if the enumerable is empty.

## Examples

    iex> sketches = Enum.map(1..3, fn i ->
    ...>   ExDataSketch.FrequentItems.new(k: 5) |> ExDataSketch.FrequentItems.update(to_string(i))
    ...> end)
    iex> merged = ExDataSketch.FrequentItems.merge_many(sketches)
    iex> ExDataSketch.FrequentItems.count(merged)
    3

# `merger`

```elixir
@spec merger(keyword()) :: (t(), t() -&gt; t())
```

Returns a 2-arity merge function suitable for combining sketches.

## Examples

    iex> is_function(ExDataSketch.FrequentItems.merger(), 2)
    true

# `new`

```elixir
@spec new(keyword()) :: t()
```

Creates a new FrequentItems sketch.

## Options

- `:k` - maximum number of counters (default: 10, must be >= 1).
- `:key_encoding` - key encoding policy: `:binary` (default), `:int`,
  or `{:term, :external}`.
- `:backend` - backend module (default: `ExDataSketch.Backend.Pure`).

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 10)
    iex> sketch.opts[:k]
    10
    iex> ExDataSketch.FrequentItems.count(sketch)
    0

# `reducer`

```elixir
@spec reducer() :: (term(), t() -&gt; t())
```

Returns a 2-arity reducer function suitable for `Enum.reduce/3` and similar.

The returned function calls `update/2` on each item.

## Examples

    iex> is_function(ExDataSketch.FrequentItems.reducer(), 2)
    true

# `serialize`

```elixir
@spec serialize(t()) :: binary()
```

Serializes the sketch to the ExDataSketch-native EXSK binary format.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 10)
    iex> binary = ExDataSketch.FrequentItems.serialize(sketch)
    iex> <<"EXSK", _rest::binary>> = binary
    iex> byte_size(binary) > 0
    true

# `top_k`

```elixir
@spec top_k(
  t(),
  keyword()
) :: [map()]
```

Returns the top-k most frequent items, sorted by estimated count descending.

Ties in count are broken by key ascending (lexicographic on raw bytes,
then decoded via the key encoding policy).

## Options

- `:limit` - maximum number of items to return (default: all entries).

Returns a list of maps with `:item`, `:estimate`, `:error`, `:lower`,
and `:upper` fields.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 10)
    iex> sketch = ExDataSketch.FrequentItems.update_many(sketch, ["a", "a", "b"])
    iex> [first | _] = ExDataSketch.FrequentItems.top_k(sketch)
    iex> first.item
    "a"

# `update`

```elixir
@spec update(t(), term()) :: t()
```

Updates the sketch with a single item.

Delegates to `update_many/2` with a single-element list.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 5)
    iex> sketch = ExDataSketch.FrequentItems.update(sketch, "hello")
    iex> ExDataSketch.FrequentItems.count(sketch)
    1

# `update_many`

```elixir
@spec update_many(t(), Enumerable.t()) :: t()
```

Updates the sketch with multiple items in a single pass.

More efficient than calling `update/2` repeatedly because it pre-aggregates
items into a frequency map, decodes and encodes the binary state only once,
and applies weighted updates for each unique item.

## Examples

    iex> sketch = ExDataSketch.FrequentItems.new(k: 5)
    iex> sketch = ExDataSketch.FrequentItems.update_many(sketch, ["a", "b", "a"])
    iex> ExDataSketch.FrequentItems.count(sketch)
    3

---

*Consult [api-reference.md](api-reference.md) for complete listing*
