ExDataSketch.XorFilter (ExDataSketch v0.7.1)

Copy Markdown View Source

Xor filter for static, immutable probabilistic membership testing.

An Xor filter is a space-efficient probabilistic data structure that answers "is this item in the set?" with no false negatives and a tunable false positive rate. Unlike Bloom, Cuckoo, or Quotient filters, Xor filters are immutable: all items must be provided at construction time via build/2. After construction, only member?/2 queries are supported -- no insertion, deletion, or merge.

Algorithm

Construction builds a 3-partite hypergraph from the input set, peels it (iteratively removing degree-1 positions), and assigns fingerprints via XOR equations. Queries check if B[h0(x)] XOR B[h1(x)] XOR B[h2(x)] equals the fingerprint of x -- exactly 3 memory accesses, O(1) time.

Variants

VariantFingerprintBits/itemFPR
Xor88-bit~9.84~1/256
Xor1616-bit~19.68~1/65536

Build-once Semantics

This module intentionally does not define new/1, put/2, delete/2, or merge/2. The struct has no meaningful empty state. Use build/2 to construct a filter from a complete set of items.

Parameters

  • :fingerprint_bits -- 8 (default, Xor8) or 16 (Xor16).
  • :seed -- hash seed (default: 0).
  • :backend -- backend module (default: ExDataSketch.Backend.Pure).
  • :hash_fn -- custom hash function (term -> non_neg_integer).

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a", "b", "c"])
iex> ExDataSketch.XorFilter.member?(filter, "a")
true

iex> {:ok, filter} = ExDataSketch.XorFilter.build(1..100)
iex> ExDataSketch.XorFilter.count(filter)
100

Summary

Functions

Builds an Xor filter from an enumerable of items.

Returns the set of capabilities supported by XorFilter.

Returns true if two Xor filters have compatible parameters.

Returns the number of items the filter was built from.

Deserializes an EXSK binary into an XorFilter.

Tests whether an item may be a member of the set.

Serializes the filter to the EXSK binary format.

Returns the byte size of the state binary.

Types

t()

@type t() :: %ExDataSketch.XorFilter{
  backend: module(),
  opts: keyword(),
  state: binary()
}

Functions

build(items, opts \\ [])

@spec build(
  Enumerable.t(),
  keyword()
) :: {:ok, t()} | {:error, :build_failed}

Builds an Xor filter from an enumerable of items.

All items are hashed, deduplicated, and used to construct the filter. Returns {:ok, filter} on success or {:error, :build_failed} if the hypergraph cannot be peeled after 100 seed retries.

Options

  • :fingerprint_bits -- 8 (default) or 16.
  • :seed -- hash seed (default: 0).
  • :backend -- backend module.
  • :hash_fn -- custom hash function (term -> non_neg_integer).

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["x", "y", "z"])
iex> ExDataSketch.XorFilter.member?(filter, "x")
true

capabilities()

Returns the set of capabilities supported by XorFilter.

compatible_with?(xor_filter1, xor_filter2)

@spec compatible_with?(t(), t()) :: boolean()

Returns true if two Xor filters have compatible parameters.

Compatible filters have the same fingerprint_bits and seed.

Examples

iex> {:ok, a} = ExDataSketch.XorFilter.build(["a"], fingerprint_bits: 8)
iex> {:ok, b} = ExDataSketch.XorFilter.build(["b"], fingerprint_bits: 8)
iex> ExDataSketch.XorFilter.compatible_with?(a, b)
true

count(xor_filter)

@spec count(t()) :: non_neg_integer()

Returns the number of items the filter was built from.

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a", "b"])
iex> ExDataSketch.XorFilter.count(filter)
2

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into an XorFilter.

Returns {:ok, filter} on success or {:error, reason} on failure.

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["test"])
iex> {:ok, recovered} = ExDataSketch.XorFilter.deserialize(ExDataSketch.XorFilter.serialize(filter))
iex> ExDataSketch.XorFilter.member?(recovered, "test")
true

member?(xor_filter, item)

@spec member?(t(), term()) :: boolean()

Tests whether an item may be a member of the set.

Returns true if the item is possibly in the set (may be a false positive), false if the item is definitely not in the set. Never returns false negatives for items that were included in build/2.

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["hello"])
iex> ExDataSketch.XorFilter.member?(filter, "hello")
true

serialize(xor_filter)

@spec serialize(t()) :: binary()

Serializes the filter to the EXSK binary format.

Examples

iex> {:ok, filter} = ExDataSketch.XorFilter.build(["a"])
iex> binary = ExDataSketch.XorFilter.serialize(filter)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

size_bytes(xor_filter)

@spec size_bytes(t()) :: non_neg_integer()

Returns the byte size of the state binary.

Examples

iex> {:ok, xor} = ExDataSketch.XorFilter.build(["a", "b"])
iex> ExDataSketch.XorFilter.size_bytes(xor) > 0
true