Xor filter for static, immutable probabilistic membership testing.
An Xor filter is a space-efficient probabilistic data structure that answers
"is this item in the set?" with no false negatives and a tunable false positive
rate. Unlike Bloom, Cuckoo, or Quotient filters, Xor filters are immutable:
all items must be provided at construction time via build/2. After construction,
only member?/2 queries are supported -- no insertion, deletion, or merge.
Algorithm
Construction builds a 3-partite hypergraph from the input set, peels it
(iteratively removing degree-1 positions), and assigns fingerprints via
XOR equations. Queries check if B[h0(x)] XOR B[h1(x)] XOR B[h2(x)]
equals the fingerprint of x -- exactly 3 memory accesses, O(1) time.
Variants
| Variant | Fingerprint | Bits/item | FPR |
|---|---|---|---|
| Xor8 | 8-bit | ~9.84 | ~1/256 |
| Xor16 | 16-bit | ~19.68 | ~1/65536 |
Build-once Semantics
This module intentionally does not define new/1, put/2, delete/2, or
merge/2. The struct has no meaningful empty state. Use build/2 to construct
a filter from a complete set of items.
Parameters
:fingerprint_bits-- 8 (default, Xor8) or 16 (Xor16).:seed-- hash seed (default: 0).:backend-- backend module (default:ExDataSketch.Backend.Pure).:hash_fn-- custom hash function(term -> non_neg_integer).
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a", "b", "c"])
iex> ExDataSketch.XorFilter.member?(filter, "a")
true
iex> {:ok, filter} = ExDataSketch.XorFilter.build(1..100)
iex> ExDataSketch.XorFilter.count(filter)
100
Summary
Functions
Builds an Xor filter from an enumerable of items.
Returns the set of capabilities supported by XorFilter.
Returns true if two Xor filters have compatible parameters.
Returns the number of items the filter was built from.
Deserializes an EXSK binary into an XorFilter.
Tests whether an item may be a member of the set.
Serializes the filter to the EXSK binary format.
Returns the byte size of the state binary.
Types
Functions
@spec build( Enumerable.t(), keyword() ) :: {:ok, t()} | {:error, :build_failed}
Builds an Xor filter from an enumerable of items.
All items are hashed, deduplicated, and used to construct the filter.
Returns {:ok, filter} on success or {:error, :build_failed} if
the hypergraph cannot be peeled after 100 seed retries.
Options
:fingerprint_bits-- 8 (default) or 16.:seed-- hash seed (default: 0).:backend-- backend module.:hash_fn-- custom hash function(term -> non_neg_integer).
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["x", "y", "z"])
iex> ExDataSketch.XorFilter.member?(filter, "x")
true
Returns the set of capabilities supported by XorFilter.
Returns true if two Xor filters have compatible parameters.
Compatible filters have the same fingerprint_bits and seed.
Examples
iex> {:ok, a} = ExDataSketch.XorFilter.build(["a"], fingerprint_bits: 8)
iex> {:ok, b} = ExDataSketch.XorFilter.build(["b"], fingerprint_bits: 8)
iex> ExDataSketch.XorFilter.compatible_with?(a, b)
true
@spec count(t()) :: non_neg_integer()
Returns the number of items the filter was built from.
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a", "b"])
iex> ExDataSketch.XorFilter.count(filter)
2
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into an XorFilter.
Returns {:ok, filter} on success or {:error, reason} on failure.
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["test"])
iex> {:ok, recovered} = ExDataSketch.XorFilter.deserialize(ExDataSketch.XorFilter.serialize(filter))
iex> ExDataSketch.XorFilter.member?(recovered, "test")
true
Tests whether an item may be a member of the set.
Returns true if the item is possibly in the set (may be a false positive),
false if the item is definitely not in the set. Never returns false negatives
for items that were included in build/2.
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["hello"])
iex> ExDataSketch.XorFilter.member?(filter, "hello")
true
Serializes the filter to the EXSK binary format.
Examples
iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a"])
iex> binary = ExDataSketch.XorFilter.serialize(filter)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
@spec size_bytes(t()) :: non_neg_integer()
Returns the byte size of the state binary.
Examples
iex> {:ok, xor} = ExDataSketch.XorFilter.build(["a", "b"])
iex> ExDataSketch.XorFilter.size_bytes(xor) > 0
true