MisraGries sketch for deterministic heavy hitter detection.
The Misra-Gries algorithm maintains at most k counters to track frequent
items in a data stream. It provides a deterministic guarantee: any item
whose true frequency exceeds n/k (where n is the total count) is
guaranteed to be tracked.
Algorithm
Update(x): If x is tracked, increment its counter. If there are fewer than k entries, insert x with count 1. Otherwise, decrement all counters by 1 and remove any that reach zero.
Guarantee: If an item appears more than
n/ktimes, it will be in the counter set when queried. The estimated count is a lower bound on the true count, with error at mostn/k.
Comparison with FrequentItems (SpaceSaving)
| Feature | MisraGries | FrequentItems |
|---|---|---|
| Algorithm | Decrement-all | SpaceSaving (min-replacement) |
| Guarantee | Deterministic: freq > n/k always tracked | Probabilistic with error bounds |
| Counter count | At most k | Exactly k |
| Estimate | Lower bound | Estimate with overcount error |
Binary State Layout (MG01)
All multi-byte fields are little-endian.
HEADER (22 bytes):
magic: 4 bytes "MG01"
version: u8 1
reserved: u8 0
k: u32 LE max counters
n: u64 LE total count
entry_count: u32 LE number of entries
ENTRIES (variable):
entry_count x:
key_len: u32 LE
key: key_len bytes
count: u64 LEOptions
:k- maximum number of counters (default: 10, must be >= 1).:key_encoding- key encoding policy::binary(default),:int, or{:term, :external}.:backend- backend module (default:ExDataSketch.Backend.Pure).
Merge Properties
MisraGries merge is commutative. Both sketches must have the same
k parameter. Count (n) is always exactly additive.
Summary
Functions
Returns the total number of items inserted into the sketch.
Deserializes an EXSK binary into a MisraGries sketch.
Returns the number of distinct tracked entries.
Returns the estimated frequency of an item.
Returns entries whose estimated frequency exceeds the given threshold.
Creates a new MisraGries sketch from an enumerable.
Merges two MisraGries instances.
Merges a non-empty enumerable of MisraGries instances into one.
Returns a 2-arity merge function suitable for combining sketches.
Creates a new MisraGries sketch.
Returns a 2-arity reducer function suitable for Enum.reduce/3.
Serializes the sketch to the ExDataSketch-native EXSK binary format.
Returns the size of the sketch state in bytes.
Returns the top entries sorted by count descending.
Updates the sketch with a single item.
Updates the sketch with multiple items in a single pass.
Types
Functions
@spec count(t()) :: non_neg_integer()
Returns the total number of items inserted into the sketch.
Examples
iex> ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.count()
0
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into a MisraGries sketch.
Returns {:ok, sketch} on success or {:error, reason} on failure.
Examples
iex> ExDataSketch.MisraGries.deserialize(<<"invalid">>)
{:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}
@spec entry_count(t()) :: non_neg_integer()
Returns the number of distinct tracked entries.
Examples
iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b", "c"])
iex> ExDataSketch.MisraGries.entry_count(sketch)
3
@spec estimate(t(), term()) :: non_neg_integer()
Returns the estimated frequency of an item.
The estimate is a lower bound on the true count. If the item is not tracked, returns 0.
Examples
iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "a", "b"])
iex> ExDataSketch.MisraGries.estimate(sketch, "a")
2
@spec frequent(t(), float()) :: [{term(), non_neg_integer()}]
Returns entries whose estimated frequency exceeds the given threshold.
The threshold is a fraction in (0.0, 1.0). Returns entries whose count
is greater than threshold * count(sketch).
Examples
iex> sketch = ExDataSketch.MisraGries.new(k: 5)
iex> sketch = Enum.reduce(1..100, sketch, fn _, s -> ExDataSketch.MisraGries.update(s, "heavy") end)
iex> sketch = Enum.reduce(1..10, sketch, fn i, s -> ExDataSketch.MisraGries.update(s, "light_#{i}") end)
iex> frequent = ExDataSketch.MisraGries.frequent(sketch, 0.5)
iex> Enum.any?(frequent, fn {item, _count} -> item == "heavy" end)
true
@spec from_enumerable( Enumerable.t(), keyword() ) :: t()
Creates a new MisraGries sketch from an enumerable.
Examples
iex> sketch = ExDataSketch.MisraGries.from_enumerable(["a", "b", "a"], k: 5)
iex> ExDataSketch.MisraGries.count(sketch)
3
Merges two MisraGries instances.
Both sketches must have the same k parameter.
Examples
iex> a = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "a"])
iex> b = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b"])
iex> merged = ExDataSketch.MisraGries.merge(a, b)
iex> ExDataSketch.MisraGries.count(merged)
4
@spec merge_many(Enumerable.t()) :: t()
Merges a non-empty enumerable of MisraGries instances into one.
Examples
iex> sketches = Enum.map(1..3, fn _ ->
...> ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update("x")
...> end)
iex> merged = ExDataSketch.MisraGries.merge_many(sketches)
iex> ExDataSketch.MisraGries.count(merged)
3
Returns a 2-arity merge function suitable for combining sketches.
Examples
iex> is_function(ExDataSketch.MisraGries.merger(), 2)
true
Creates a new MisraGries sketch.
Options
:k- maximum number of counters (default: 10, must be >= 1).:key_encoding-:binary(default),:int, or{:term, :external}.:backend- backend module (default:ExDataSketch.Backend.Pure).
Examples
iex> sketch = ExDataSketch.MisraGries.new(k: 10)
iex> sketch.opts[:k]
10
iex> ExDataSketch.MisraGries.count(sketch)
0
Returns a 2-arity reducer function suitable for Enum.reduce/3.
Examples
iex> is_function(ExDataSketch.MisraGries.reducer(), 2)
true
Serializes the sketch to the ExDataSketch-native EXSK binary format.
Examples
iex> sketch = ExDataSketch.MisraGries.new()
iex> binary = ExDataSketch.MisraGries.serialize(sketch)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
@spec size_bytes(t()) :: non_neg_integer()
Returns the size of the sketch state in bytes.
Examples
iex> sketch = ExDataSketch.MisraGries.new()
iex> ExDataSketch.MisraGries.size_bytes(sketch) > 0
true
@spec top_k(t(), non_neg_integer()) :: [{term(), non_neg_integer()}]
Returns the top entries sorted by count descending.
Each entry is a {item, count} tuple where the item is decoded using
the configured key encoding.
Examples
iex> sketch = ExDataSketch.MisraGries.new()
iex> sketch = ExDataSketch.MisraGries.update_many(sketch, ["a", "a", "a", "b", "b", "c"])
iex> [{top_item, top_count} | _] = ExDataSketch.MisraGries.top_k(sketch, 2)
iex> top_item
"a"
iex> top_count
3
Updates the sketch with a single item.
The item is encoded using the configured key encoding before insertion.
Examples
iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update("hello")
iex> ExDataSketch.MisraGries.count(sketch)
1
@spec update_many(t(), Enumerable.t()) :: t()
Updates the sketch with multiple items in a single pass.
Examples
iex> sketch = ExDataSketch.MisraGries.new() |> ExDataSketch.MisraGries.update_many(["a", "b", "a"])
iex> ExDataSketch.MisraGries.count(sketch)
3