ExDataSketch.CQF (ExDataSketch v0.7.1)

Copy Markdown View Source

Counting Quotient Filter (CQF) for multiset membership with approximate counting.

A CQF extends the Quotient filter with variable-length counter encoding, enabling not just "is this item present?" but "how many times has this item been inserted?" It uses the same quotient/remainder hash split as a standard Quotient filter, but stores counts inline using a monotonicity-violation encoding scheme within runs.

Counter Encoding

Remainders within a run are stored in strictly increasing order. A slot value that violates this monotonicity is interpreted as a counter for the preceding remainder:

  • Count = 1: no extra slots (absence of counter means count 1).
  • Count = 2: one extra slot containing the same remainder value (a duplicate).
  • Count >= 3: the remainder value appears twice (bracketing), with intermediate slots encoding the count value.

Parameters

  • :q -- quotient bits (default: 16). Determines the number of slots: 2^q.
  • :r -- remainder bits (default: 8). Determines false positive rate: ~1/2^r. Constraint: q + r <= 64.
  • :seed -- hash seed (default: 0).

Merge Semantics

CQF merge is a multiset union: counts for identical fingerprints are summed, not OR'd. This enables distributed counting use cases where partial counts from multiple workers are combined.

Binary State Layout (CQF1)

40-byte header followed by a packed slot array. Each slot contains 3 metadata bits (is_occupied, is_continuation, is_shifted) and r remainder bits, packed LSB-first. The header includes a 64-bit total_count field tracking the sum of all item multiplicities.

Summary

Functions

Returns the set of capabilities supported by CQF.

Returns true if two CQFs have compatible parameters.

Returns the total count of all items (sum of all multiplicities).

Deletes a single occurrence of an item (decrements its count).

Deserializes an EXSK binary into a CQF.

Returns the estimated count (multiplicity) of an item.

Creates a CQF from an enumerable of items.

Tests whether an item may be a member of the multiset.

Merges two CQFs via multiset union (counts are summed).

Merges a non-empty enumerable of CQFs into one.

Returns a 2-arity merge function for combining filters.

Creates a new empty Counting Quotient Filter.

Inserts a single item into the filter, incrementing its count.

Inserts multiple items in a single pass.

Returns a 2-arity reducer function for use with Enum.reduce/3.

Serializes the filter to the EXSK binary format.

Returns the byte size of the state binary.

Types

t()

@type t() :: %ExDataSketch.CQF{backend: module(), opts: keyword(), state: binary()}

Functions

capabilities()

Returns the set of capabilities supported by CQF.

compatible_with?(cqf1, cqf2)

@spec compatible_with?(t(), t()) :: boolean()

Returns true if two CQFs have compatible parameters.

Examples

iex> a = ExDataSketch.CQF.new(q: 10, r: 8)
iex> b = ExDataSketch.CQF.new(q: 10, r: 8)
iex> ExDataSketch.CQF.compatible_with?(a, b)
true

count(cqf)

@spec count(t()) :: non_neg_integer()

Returns the total count of all items (sum of all multiplicities).

Examples

iex> ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.count()
0

delete(cqf, item)

@spec delete(t(), term()) :: t()

Deletes a single occurrence of an item (decrements its count).

If the item's count reaches 0, it is removed entirely. Deleting a non-member is a no-op.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("hello")
iex> cqf = ExDataSketch.CQF.delete(cqf, "hello")
iex> ExDataSketch.CQF.member?(cqf, "hello")
false

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into a CQF.

Returns {:ok, cqf} on success or {:error, reason} on failure.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("test")
iex> {:ok, recovered} = ExDataSketch.CQF.deserialize(ExDataSketch.CQF.serialize(cqf))
iex> ExDataSketch.CQF.member?(recovered, "test")
true

estimate_count(cqf, item)

@spec estimate_count(t(), term()) :: non_neg_integer()

Returns the estimated count (multiplicity) of an item.

Returns 0 if the item is not present. Due to hash collisions, the count may be an overestimate but never an underestimate.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8)
iex> cqf = cqf |> ExDataSketch.CQF.put("x") |> ExDataSketch.CQF.put("x") |> ExDataSketch.CQF.put("x")
iex> ExDataSketch.CQF.estimate_count(cqf, "x")
3

from_enumerable(enumerable, opts \\ [])

@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()

Creates a CQF from an enumerable of items.

Equivalent to new(opts) |> put_many(enumerable).

Examples

iex> cqf = ExDataSketch.CQF.from_enumerable(["a", "b", "c"])
iex> ExDataSketch.CQF.member?(cqf, "a")
true

member?(cqf, item)

@spec member?(t(), term()) :: boolean()

Tests whether an item may be a member of the multiset.

Returns true if the item is possibly in the set (may be a false positive), false if the item is definitely not in the set.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("hello")
iex> ExDataSketch.CQF.member?(cqf, "hello")
true

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8)
iex> ExDataSketch.CQF.member?(cqf, "hello")
false

merge(cqf, cqf)

@spec merge(t(), t()) :: t()

Merges two CQFs via multiset union (counts are summed).

Both filters must have identical q, r, and seed. Raises ExDataSketch.Errors.IncompatibleSketchesError if parameters differ.

Examples

iex> a = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("x")
iex> b = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("y")
iex> merged = ExDataSketch.CQF.merge(a, b)
iex> ExDataSketch.CQF.member?(merged, "x") and ExDataSketch.CQF.member?(merged, "y")
true

merge_many(filters)

@spec merge_many(Enumerable.t()) :: t()

Merges a non-empty enumerable of CQFs into one.

Examples

iex> filters = Enum.map(1..3, fn i ->
...>   ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("item_#{i}")
...> end)
iex> merged = ExDataSketch.CQF.merge_many(filters)
iex> ExDataSketch.CQF.member?(merged, "item_1")
true

merger(opts \\ [])

@spec merger(keyword()) :: (t(), t() -> t())

Returns a 2-arity merge function for combining filters.

Examples

iex> is_function(ExDataSketch.CQF.merger(), 2)
true

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new empty Counting Quotient Filter.

Options

  • :q -- quotient bits (default: 16). Range: 1..28.
  • :r -- remainder bits (default: 8). Range: 1..32. Constraint: q + r <= 64.
  • :seed -- hash seed (default: 0).
  • :backend -- backend module (default: ExDataSketch.Backend.Pure).
  • :hash_fn -- custom hash function (term -> non_neg_integer).

Examples

iex> cqf = ExDataSketch.CQF.new()
iex> cqf.opts[:q]
16

iex> cqf = ExDataSketch.CQF.new(q: 12, r: 10)
iex> cqf.opts[:r]
10

put(cqf, item)

@spec put(t(), term()) :: t()

Inserts a single item into the filter, incrementing its count.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put("hello")
iex> ExDataSketch.CQF.member?(cqf, "hello")
true

put_many(cqf, items)

@spec put_many(t(), Enumerable.t()) :: t()

Inserts multiple items in a single pass.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8) |> ExDataSketch.CQF.put_many(["a", "b"])
iex> ExDataSketch.CQF.member?(cqf, "a")
true

reducer()

@spec reducer() :: (term(), t() -> t())

Returns a 2-arity reducer function for use with Enum.reduce/3.

Examples

iex> is_function(ExDataSketch.CQF.reducer(), 2)
true

serialize(cqf)

@spec serialize(t()) :: binary()

Serializes the filter to the EXSK binary format.

Examples

iex> cqf = ExDataSketch.CQF.new(q: 10, r: 8)
iex> binary = ExDataSketch.CQF.serialize(cqf)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

size_bytes(cqf)

@spec size_bytes(t()) :: non_neg_integer()

Returns the byte size of the state binary.

Examples

iex> cqf = ExDataSketch.CQF.new()
iex> ExDataSketch.CQF.size_bytes(cqf) > 0
true