Cuckoo filter for probabilistic membership testing with deletion support.
A Cuckoo filter is a space-efficient probabilistic data structure that supports insertion, deletion, and membership queries. It uses partial-key cuckoo hashing to store compact fingerprints in a bucket-based hash table.
Unlike Bloom filters, Cuckoo filters support deletion. Unlike XOR filters, Cuckoo filters support incremental insertion. The trade-off is that insertion can fail when the filter approaches capacity.
Parameters
:capacity-- expected number of items (default: 10,000). Used to derive the number of buckets.:fingerprint_size-- fingerprint width in bits (default: 8). Supported values: 8, 12, 16. Determines the false positive rate: FPR ~= 2 * bucket_size / 2^fingerprint_size.:bucket_size-- slots per bucket (default: 4). Supported values: 2, 4. Higher values improve space efficiency at the cost of lookup latency.:max_kicks-- maximum relocation attempts before declaring full (default: 500). Range: 100..2000.:seed-- hash seed (default: 0).
False Positive Rates
With default bucket_size=4:
| fingerprint_size | FPR |
|---|---|
| 8 | ~3.1% |
| 12 | ~0.2% |
| 16 | ~0.012% |
Hash Strategy
Items are hashed via ExDataSketch.Hash.hash64/1 at the API boundary.
The bucket index is derived from the lower bits and the fingerprint from
the upper bits of the 64-bit hash. Alternate bucket index uses XOR with
a hash of the fingerprint (partial-key cuckoo hashing).
Deletion Hazard
Only delete items that were definitely inserted. Deleting a false-positive item removes a legitimate fingerprint, creating a false negative for a different item. This hazard is inherent to all Cuckoo filters.
Binary State Layout (CKO1)
See plans/adr/ADR-102-cuckoo-binary-format.md for the full specification.
32-byte header followed by a flat bucket array of fingerprint entries.
Merge Policy
Merge is explicitly not supported. Cuckoo filter merge would require
re-inserting all fingerprints which can fail due to capacity limits.
For mergeable membership filters, use ExDataSketch.Bloom.
Summary
Functions
Returns the set of capabilities supported by the Cuckoo filter.
Returns true if two Cuckoo filters have compatible parameters.
Returns the number of items currently stored in the filter.
Deletes a single item from the filter.
Deserializes an EXSK binary into a Cuckoo filter.
Creates a Cuckoo filter from an enumerable of items.
Tests whether an item may be a member of the set.
Creates a new empty Cuckoo filter.
Inserts a single item into the filter.
Inserts a single item, raising on failure.
Inserts multiple items in a single pass.
Returns a 2-arity reducer function for use with Enum.reduce/3.
Serializes the filter to the ExDataSketch-native EXSK binary format.
Returns the byte size of the state binary.
Types
Functions
Returns the set of capabilities supported by the Cuckoo filter.
Examples
iex> caps = ExDataSketch.Cuckoo.capabilities()
iex> :put in caps and :delete in caps and :member? in caps
true
Returns true if two Cuckoo filters have compatible parameters.
Compatible filters have the same bucket_count, fingerprint_size, bucket_size, and seed.
Examples
iex> a = ExDataSketch.Cuckoo.new(capacity: 100)
iex> b = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.compatible_with?(a, b)
true
@spec count(t()) :: non_neg_integer()
Returns the number of items currently stored in the filter.
Examples
iex> ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.count()
0
Deletes a single item from the filter.
Returns {:ok, cuckoo} if the item's fingerprint was found and removed,
or {:error, :not_found} if no matching fingerprint exists.
WARNING: Only delete items that were definitely inserted. Deleting a false-positive item removes a legitimate fingerprint belonging to a different item, creating a false negative.
Examples
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("hello")
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
false
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into a Cuckoo filter.
Returns {:ok, cuckoo} on success or {:error, reason} on failure.
Examples
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("test")
iex> {:ok, recovered} = ExDataSketch.Cuckoo.deserialize(ExDataSketch.Cuckoo.serialize(cuckoo))
iex> ExDataSketch.Cuckoo.member?(recovered, "test")
true
@spec from_enumerable( Enumerable.t(), keyword() ) :: {:ok, t()} | {:error, :full, t()}
Creates a Cuckoo filter from an enumerable of items.
Equivalent to new(opts) |> put_many(enumerable).
Returns {:ok, cuckoo} or {:error, :full, cuckoo}.
Examples
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.from_enumerable(["a", "b", "c"], capacity: 100)
iex> ExDataSketch.Cuckoo.member?(cuckoo, "a")
true
Tests whether an item may be a member of the set.
Returns true if the item is possibly in the set (may be a false positive),
false if the item is definitely not in the set.
Examples
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.new(capacity: 100) |> ExDataSketch.Cuckoo.put("hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
false
Creates a new empty Cuckoo filter.
Options
:capacity-- expected number of items (default: 10000).:fingerprint_size-- fingerprint width in bits (default: 8). Supported values: 8, 12, 16.:bucket_size-- slots per bucket (default: 4). Supported values: 2, 4.:max_kicks-- maximum relocation attempts (default: 500). Range: 100..2000.:seed-- hash seed (default: 0).:backend-- backend module (default:ExDataSketch.Backend.Pure).:hash_fn-- custom hash function(term -> non_neg_integer).
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new()
iex> cuckoo.opts[:capacity]
10000
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 1000, fingerprint_size: 16)
iex> cuckoo.opts[:fingerprint_size]
16
Inserts a single item into the filter.
Returns {:ok, cuckoo} on success or {:error, :full} if the filter
cannot accommodate the item after max_kicks relocation attempts.
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true
Inserts a single item, raising on failure.
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> cuckoo = ExDataSketch.Cuckoo.put!(cuckoo, "hello")
iex> ExDataSketch.Cuckoo.member?(cuckoo, "hello")
true
@spec put_many(t(), Enumerable.t()) :: {:ok, t()} | {:error, :full, t()}
Inserts multiple items in a single pass.
Returns {:ok, cuckoo} if all items were inserted, or
{:error, :full, cuckoo} with the partially updated filter
if insertion failed partway through.
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> {:ok, cuckoo} = ExDataSketch.Cuckoo.put_many(cuckoo, ["a", "b", "c"])
iex> ExDataSketch.Cuckoo.member?(cuckoo, "a")
true
Returns a 2-arity reducer function for use with Enum.reduce/3.
The reducer calls put!/2 and raises if the filter becomes full.
Examples
iex> is_function(ExDataSketch.Cuckoo.reducer(), 2)
true
Serializes the filter to the ExDataSketch-native EXSK binary format.
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> binary = ExDataSketch.Cuckoo.serialize(cuckoo)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
@spec size_bytes(t()) :: non_neg_integer()
Returns the byte size of the state binary.
Examples
iex> cuckoo = ExDataSketch.Cuckoo.new(capacity: 100)
iex> ExDataSketch.Cuckoo.size_bytes(cuckoo) > 0
true