ExDataSketch.MisraGries (ExDataSketch v0.7.1)

Copy Markdown View Source

MisraGries sketch for deterministic heavy hitter detection.

The Misra-Gries algorithm maintains at most k counters to track frequent items in a data stream. It provides a deterministic guarantee: any item whose true frequency exceeds n/k (where n is the total count) is guaranteed to be tracked.

Algorithm

  • Update(x): If x is tracked, increment its counter. If there are fewer than k entries, insert x with count 1. Otherwise, decrement all counters by 1 and remove any that reach zero.

  • Guarantee: If an item appears more than n/k times, it will be in the counter set when queried. The estimated count is a lower bound on the true count, with error at most n/k.

Comparison with FrequentItems (SpaceSaving)

FeatureMisraGriesFrequentItems
AlgorithmDecrement-allSpaceSaving (min-replacement)
GuaranteeDeterministic: freq > n/k always trackedProbabilistic with error bounds
Counter countAt most kExactly k
EstimateLower boundEstimate with overcount error

Binary State Layout (MG01)

All multi-byte fields are little-endian.

HEADER (22 bytes):
  magic:       4 bytes  "MG01"
  version:     u8       1
  reserved:    u8       0
  k:           u32 LE   max counters
  n:           u64 LE   total count
  entry_count: u32 LE   number of entries

ENTRIES (variable):
  entry_count x:
    key_len:   u32 LE
    key:       key_len bytes
    count:     u64 LE

Options

  • :k - maximum number of counters (default: 10, must be >= 1).
  • :key_encoding - key encoding policy: :binary (default), :int, or {:term, :external}.
  • :backend - backend module (default: ExDataSketch.Backend.Pure).

Merge Properties

MisraGries merge is commutative. Both sketches must have the same k parameter. Count (n) is always exactly additive.

Summary

Functions

Returns the total number of items inserted into the sketch.

Deserializes an EXSK binary into a MisraGries sketch.

Returns the number of distinct tracked entries.

Returns the estimated frequency of an item.

Returns entries whose estimated frequency exceeds the given threshold.

Creates a new MisraGries sketch from an enumerable.

Merges two MisraGries instances.

Merges a non-empty enumerable of MisraGries instances into one.

Returns a 2-arity merge function suitable for combining sketches.

Creates a new MisraGries sketch.

Returns a 2-arity reducer function suitable for Enum.reduce/3.

Serializes the sketch to the ExDataSketch-native EXSK binary format.

Returns the size of the sketch state in bytes.

Returns the top entries sorted by count descending.

Updates the sketch with a single item.

Updates the sketch with multiple items in a single pass.

Types

t()

@type t() :: %ExDataSketch.MisraGries{
  backend: module(),
  opts: keyword(),
  state: binary()
}

Functions

count(misra_gries)

@spec count(t()) :: non_neg_integer()

Returns the total number of items inserted into the sketch.

Examples

iex> ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.count()
0

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into a MisraGries sketch.

Returns {:ok, sketch} on success or {:error, reason} on failure.

Examples

iex> ExDataSketch.MisraGries.deserialize(<<"invalid">>)
{:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

entry_count(misra_gries)

@spec entry_count(t()) :: non_neg_integer()

Returns the number of distinct tracked entries.

Examples

iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b", "c"])
iex> ExDataSketch.MisraGries.entry_count(sketch)
3

estimate(misra_gries, item)

@spec estimate(t(), term()) :: non_neg_integer()

Returns the estimated frequency of an item.

The estimate is a lower bound on the true count. If the item is not tracked, returns 0.

Examples

iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "a", "b"])
iex> ExDataSketch.MisraGries.estimate(sketch, "a")
2

frequent(sketch, threshold)

@spec frequent(t(), float()) :: [{term(), non_neg_integer()}]

Returns entries whose estimated frequency exceeds the given threshold.

The threshold is a fraction in (0.0, 1.0). Returns entries whose count is greater than threshold * count(sketch).

Examples

iex> sketch = ExDataSketch.MisraGries.new(k: 5)
iex> sketch = Enum.reduce(1..100, sketch, fn _, s -> ExDataSketch.MisraGries.update(s, "heavy") end)
iex> sketch = Enum.reduce(1..10, sketch, fn i, s -> ExDataSketch.MisraGries.update(s, "light_#{i}") end)
iex> frequent = ExDataSketch.MisraGries.frequent(sketch, 0.5)
iex> Enum.any?(frequent, fn {item, _count} -> item == "heavy" end)
true

from_enumerable(enumerable, opts \\ [])

@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()

Creates a new MisraGries sketch from an enumerable.

Examples

iex> sketch = ExDataSketch.MisraGries.from_enumerable(["a", "b", "a"], k: 5)
iex> ExDataSketch.MisraGries.count(sketch)
3

merge(sketch, misra_gries)

@spec merge(t(), t()) :: t()

Merges two MisraGries instances.

Both sketches must have the same k parameter.

Examples

iex> a = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "a"])
iex> b = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b"])
iex> merged = ExDataSketch.MisraGries.merge(a, b)
iex> ExDataSketch.MisraGries.count(merged)
4

merge_many(sketches)

@spec merge_many(Enumerable.t()) :: t()

Merges a non-empty enumerable of MisraGries instances into one.

Examples

iex> sketches = Enum.map(1..3, fn _ ->
...>   ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update("x")
...> end)
iex> merged = ExDataSketch.MisraGries.merge_many(sketches)
iex> ExDataSketch.MisraGries.count(merged)
3

merger(opts \\ [])

@spec merger(keyword()) :: (t(), t() -> t())

Returns a 2-arity merge function suitable for combining sketches.

Examples

iex> is_function(ExDataSketch.MisraGries.merger(), 2)
true

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new MisraGries sketch.

Options

  • :k - maximum number of counters (default: 10, must be >= 1).
  • :key_encoding - :binary (default), :int, or {:term, :external}.
  • :backend - backend module (default: ExDataSketch.Backend.Pure).

Examples

iex> sketch = ExDataSketch.MisraGries.new(k: 10)
iex> sketch.opts[:k]
10
iex> ExDataSketch.MisraGries.count(sketch)
0

reducer()

@spec reducer() :: (term(), t() -> t())

Returns a 2-arity reducer function suitable for Enum.reduce/3.

Examples

iex> is_function(ExDataSketch.MisraGries.reducer(), 2)
true

serialize(misra_gries)

@spec serialize(t()) :: binary()

Serializes the sketch to the ExDataSketch-native EXSK binary format.

Examples

iex> sketch = ExDataSketch.MisraGries.new()
iex> binary = ExDataSketch.MisraGries.serialize(sketch)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

size_bytes(misra_gries)

@spec size_bytes(t()) :: non_neg_integer()

Returns the size of the sketch state in bytes.

Examples

iex> sketch = ExDataSketch.MisraGries.new()
iex> ExDataSketch.MisraGries.size_bytes(sketch) > 0
true

top_k(misra_gries, limit)

@spec top_k(t(), non_neg_integer()) :: [{term(), non_neg_integer()}]

Returns the top entries sorted by count descending.

Each entry is a {item, count} tuple where the item is decoded using the configured key encoding.

Examples

iex> sketch = ExDataSketch.MisraGries.new()
iex> sketch = ExDataSketch.MisraGries.update_many(sketch, ["a", "a", "a", "b", "b", "c"])
iex> [{top_item, top_count} | _] = ExDataSketch.MisraGries.top_k(sketch, 2)
iex> top_item
"a"
iex> top_count
3

update(sketch, item)

@spec update(t(), term()) :: t()

Updates the sketch with a single item.

The item is encoded using the configured key encoding before insertion.

Examples

iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update("hello")
iex> ExDataSketch.MisraGries.count(sketch)
1

update_many(sketch, items)

@spec update_many(t(), Enumerable.t()) :: t()

Updates the sketch with multiple items in a single pass.

Examples

iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b", "a"])
iex> ExDataSketch.MisraGries.count(sketch)
3