ExDataSketch.Cuckoo (ExDataSketch v0.7.1)

Copy Markdown View Source

Cuckoo filter for probabilistic membership testing with deletion support.

A Cuckoo filter is a space-efficient probabilistic data structure that supports insertion, deletion, and membership queries. It uses partial-key cuckoo hashing to store compact fingerprints in a bucket-based hash table.

Unlike Bloom filters, Cuckoo filters support deletion. Unlike XOR filters, Cuckoo filters support incremental insertion. The trade-off is that insertion can fail when the filter approaches capacity.

Parameters

  • :capacity -- expected number of items (default: 10,000). Used to derive the number of buckets.
  • :fingerprint_size -- fingerprint width in bits (default: 8). Supported values: 8, 12, 16. Determines the false positive rate: FPR ~= 2 * bucket_size / 2^fingerprint_size.
  • :bucket_size -- slots per bucket (default: 4). Supported values: 2, 4. Higher values improve space efficiency at the cost of lookup latency.
  • :max_kicks -- maximum relocation attempts before declaring full (default: 500). Range: 100..2000.
  • :seed -- hash seed (default: 0).

False Positive Rates

With default bucket_size=4:

fingerprint_sizeFPR
8~3.1%
12~0.2%
16~0.012%

Hash Strategy

Items are hashed via ExDataSketch.Hash.hash64/1 at the API boundary. The bucket index is derived from the lower bits and the fingerprint from the upper bits of the 64-bit hash. Alternate bucket index uses XOR with a hash of the fingerprint (partial-key cuckoo hashing).

Deletion Hazard

Only delete items that were definitely inserted. Deleting a false-positive item removes a legitimate fingerprint, creating a false negative for a different item. This hazard is inherent to all Cuckoo filters.

Binary State Layout (CKO1)

See plans/adr/ADR-102-cuckoo-binary-format.md for the full specification. 32-byte header followed by a flat bucket array of fingerprint entries.

Merge Policy

Merge is explicitly not supported. Cuckoo filter merge would require re-inserting all fingerprints which can fail due to capacity limits. For mergeable membership filters, use ExDataSketch.Bloom.

Summary

Functions

Returns the set of capabilities supported by the Cuckoo filter.

Returns true if two Cuckoo filters have compatible parameters.

Returns the number of items currently stored in the filter.

Deletes a single item from the filter.

Deserializes an EXSK binary into a Cuckoo filter.

Creates a Cuckoo filter from an enumerable of items.

Tests whether an item may be a member of the set.

Creates a new empty Cuckoo filter.

Inserts a single item into the filter.

Inserts a single item, raising on failure.

Inserts multiple items in a single pass.

Returns a 2-arity reducer function for use with Enum.reduce/3.

Serializes the filter to the ExDataSketch-native EXSK binary format.

Returns the byte size of the state binary.

Types

t()

@type t() :: %ExDataSketch.Cuckoo{backend: module(), opts: keyword(), state: binary()}

Functions

capabilities()

Returns the set of capabilities supported by the Cuckoo filter.

Examples

iex> caps = ExDataSketch.Cuckoo.capabilities()
iex> :put in caps and :delete in caps and :member? in caps
true

compatible_with?(cuckoo1, cuckoo2)

@spec compatible_with?(t(), t()) :: boolean()

Returns true if two Cuckoo filters have compatible parameters.

Compatible filters have the same bucket_count, fingerprint_size, bucket_size, and seed.

Examples

iex> a = ExDataSketch.Cuckoo.new(capacity: 100)
iex> b = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.compatible_with?(a, b)
true

count(cuckoo)

@spec count(t()) :: non_neg_integer()

Returns the number of items currently stored in the filter.

Examples

iex> ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.count()
0

delete(cuckoo, item)

@spec delete(t(), term()) :: {:ok, t()} | {:error, :not_found}

Deletes a single item from the filter.

Returns {:ok, cuckoo} if the item's fingerprint was found and removed, or {:error, :not_found} if no matching fingerprint exists.

WARNING: Only delete items that were definitely inserted. Deleting a false-positive item removes a legitimate fingerprint belonging to a different item, creating a false negative.

Examples

iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("hello")
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
false

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into a Cuckoo filter.

Returns {:ok, cuckoo} on success or {:error, reason} on failure.

Examples

iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("test")
iex> {:ok, recovered} = ExDataSketch.Cuckoo.deserialize(ExDataSketch.Cuckoo.serialize(cuckoo))
iex> ExDataSketch.Cuckoo.member?(recovered, "test")
true

from_enumerable(enumerable, opts \\ [])

@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: {:ok, t()} | {:error, :full, t()}

Creates a Cuckoo filter from an enumerable of items.

Equivalent to new(opts) |> put_many(enumerable).

Returns {:ok, cuckoo} or {:error, :full, cuckoo}.

Examples

iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.from_enumerable(["a", "b", "c"], capacity: 100)
iex> ExDataSketch.Cuckoo.member?(cuckoo, "a")
true

member?(cuckoo, item)

@spec member?(t(), term()) :: boolean()

Tests whether an item may be a member of the set.

Returns true if the item is possibly in the set (may be a false positive), false if the item is definitely not in the set.

Examples

iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
false

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new empty Cuckoo filter.

Options

  • :capacity -- expected number of items (default: 10000).
  • :fingerprint_size -- fingerprint width in bits (default: 8). Supported values: 8, 12, 16.
  • :bucket_size -- slots per bucket (default: 4). Supported values: 2, 4.
  • :max_kicks -- maximum relocation attempts (default: 500). Range: 100..2000.
  • :seed -- hash seed (default: 0).
  • :backend -- backend module (default: ExDataSketch.Backend.Pure).
  • :hash_fn -- custom hash function (term -> non_neg_integer).

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new()
iex> cuckoo.opts[:capacity]
10000

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 1000, fingerprint_size: 16)
iex> cuckoo.opts[:fingerprint_size]
16

put(cuckoo, item)

@spec put(t(), term()) :: {:ok, t()} | {:error, :full}

Inserts a single item into the filter.

Returns {:ok, cuckoo} on success or {:error, :full} if the filter cannot accommodate the item after max_kicks relocation attempts.

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true

put!(cuckoo, item)

@spec put!(t(), term()) :: t()

Inserts a single item, raising on failure.

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> cuckoo = ExDataSketch.Cuckoo.put!(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true

put_many(cuckoo, items)

@spec put_many(t(), Enumerable.t()) :: {:ok, t()} | {:error, :full, t()}

Inserts multiple items in a single pass.

Returns {:ok, cuckoo} if all items were inserted, or {:error, :full, cuckoo} with the partially updated filter if insertion failed partway through.

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.put_many(cuckoo, ["a", "b", "c"])
iex> ExDataSketch.Cuckoo.member?(cuckoo, "a")
true

reducer()

@spec reducer() :: (term(), t() -> t())

Returns a 2-arity reducer function for use with Enum.reduce/3.

The reducer calls put!/2 and raises if the filter becomes full.

Examples

iex> is_function(ExDataSketch.Cuckoo.reducer(), 2)
true

serialize(cuckoo)

@spec serialize(t()) :: binary()

Serializes the filter to the ExDataSketch-native EXSK binary format.

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> binary = ExDataSketch.Cuckoo.serialize(cuckoo)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

size_bytes(cuckoo)

@spec size_bytes(t()) :: non_neg_integer()

Returns the byte size of the state binary.

Examples

iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.size_bytes(cuckoo) > 0
true