Bloom filter for probabilistic membership testing.
A Bloom filter is a space-efficient probabilistic data structure that tests
whether an element is a member of a set. False positives are possible, but
false negatives are not: if member?/2 returns false, the item was
definitely not inserted; if it returns true, the item was probably inserted.
Parameters
:capacity-- expected number of elements (default: 10,000). Used to derive the optimal bitset size.:false_positive_rate-- target false positive rate (default: 0.01). Must be between 0 and 1 exclusive.:seed-- hash seed (default: 0). Filters with different seeds are incompatible for merge.
The optimal bit_count and hash_count are derived automatically:
bit_count = ceil(-capacity * ln(fpr) / ln(2)^2)
hash_count = max(1, round(bit_count / capacity * ln(2)))Hash Strategy
Items are hashed via ExDataSketch.Hash.hash64/1 at the API boundary.
The 64-bit hash is split into two 32-bit halves for double hashing
(Kirsch-Mitzenmacher optimization):
h1 = hash >>> 32
h2 = hash &&& 0xFFFFFFFF
position_i = rem(h1 + i * h2, bit_count)Binary State Layout (BLM1)
See plans/adr/ADR-001-bloom-binary-format.md for the full specification.
40-byte header followed by a packed bitset (LSB-first byte order).
Merge Properties
Bloom merge is commutative and associative (bitwise OR). Two filters
can merge only if they have identical bit_count, hash_count, and seed.
Summary
Functions
Returns the set of capabilities supported by Bloom filter.
Returns the configured capacity (expected number of elements).
Returns true if two Bloom filters have compatible parameters.
Returns the number of set bits (popcount) in the bitset.
Deserializes an EXSK binary into a Bloom filter.
Returns the configured target false positive rate.
Creates a Bloom filter from an enumerable of items.
Tests whether an item may be a member of the set.
Merges two Bloom filters via bitwise OR.
Merges a non-empty enumerable of Bloom filters into one.
Returns a 2-arity merge function for combining filters.
Creates a new empty Bloom filter.
Inserts a single item into the filter.
Inserts multiple items in a single pass.
Returns a 2-arity reducer function for use with Enum.reduce/3.
Serializes the filter to the ExDataSketch-native EXSK binary format.
Returns the byte size of the state binary.
Types
Functions
Returns the set of capabilities supported by Bloom filter.
@spec capacity(t()) :: pos_integer()
Returns the configured capacity (expected number of elements).
Examples
iex> ExDataSketch.Bloom.new(capacity: 5000) |> ExDataSketch.Bloom.capacity()
5000
Returns true if two Bloom filters have compatible parameters.
Examples
iex> a = ExDataSketch.Bloom.new(capacity: 100)
iex> b = ExDataSketch.Bloom.new(capacity: 100)
iex> ExDataSketch.Bloom.compatible_with?(a, b)
true
@spec count(t()) :: non_neg_integer()
Returns the number of set bits (popcount) in the bitset.
This is NOT the number of inserted elements. Useful for computing fill ratio.
Examples
iex> ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.count()
0
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into a Bloom filter.
Returns {:ok, bloom} on success or {:error, reason} on failure.
Examples
iex> bloom = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("test")
iex> {:ok, recovered} = ExDataSketch.Bloom.deserialize(ExDataSketch.Bloom.serialize(bloom))
iex> ExDataSketch.Bloom.member?(recovered, "test")
true
Returns the configured target false positive rate.
Examples
iex> ExDataSketch.Bloom.new(false_positive_rate: 0.05) |> ExDataSketch.Bloom.error_rate()
0.05
@spec from_enumerable( Enumerable.t(), keyword() ) :: t()
Creates a Bloom filter from an enumerable of items.
Equivalent to new(opts) |> put_many(enumerable).
Examples
iex> bloom = ExDataSketch.Bloom.from_enumerable(["a", "b", "c"], capacity: 100)
iex> ExDataSketch.Bloom.member?(bloom, "a")
true
Tests whether an item may be a member of the set.
Returns true if the item is possibly in the set (may be a false positive),
false if the item is definitely not in the set.
Examples
iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put("hello")
iex> ExDataSketch.Bloom.member?(bloom, "hello")
true
iex> bloom = ExDataSketch.Bloom.new()
iex> ExDataSketch.Bloom.member?(bloom, "hello")
false
Merges two Bloom filters via bitwise OR.
Both filters must have identical bit_count, hash_count, and seed.
Raises ExDataSketch.Errors.IncompatibleSketchesError if parameters differ.
Examples
iex> a = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("x")
iex> b = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("y")
iex> merged = ExDataSketch.Bloom.merge(a, b)
iex> ExDataSketch.Bloom.member?(merged, "x") and ExDataSketch.Bloom.member?(merged, "y")
true
@spec merge_many(Enumerable.t()) :: t()
Merges a non-empty enumerable of Bloom filters into one.
Raises Enum.EmptyError if the enumerable is empty.
Examples
iex> filters = Enum.map(1..3, fn i ->
...> ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("item_#{i}")
...> end)
iex> merged = ExDataSketch.Bloom.merge_many(filters)
iex> ExDataSketch.Bloom.member?(merged, "item_1")
true
Returns a 2-arity merge function for combining filters.
Examples
iex> is_function(ExDataSketch.Bloom.merger(), 2)
true
Creates a new empty Bloom filter.
Options
:capacity-- expected number of elements (default: 10000).:false_positive_rate-- target FPR (default: 0.01).:seed-- hash seed (default: 0).:backend-- backend module (default:ExDataSketch.Backend.Pure).:hash_fn-- custom hash function(term -> non_neg_integer).
Examples
iex> bloom = ExDataSketch.Bloom.new()
iex> bloom.opts[:capacity]
10000
iex> bloom = ExDataSketch.Bloom.new(capacity: 1000, false_positive_rate: 0.001)
iex> bloom.opts[:capacity]
1000
Inserts a single item into the filter.
The item is hashed via ExDataSketch.Hash.hash64/1 before insertion.
Examples
iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put("hello")
iex> ExDataSketch.Bloom.member?(bloom, "hello")
true
@spec put_many(t(), Enumerable.t()) :: t()
Inserts multiple items in a single pass.
More efficient than calling put/2 repeatedly because it minimizes
intermediate binary allocations.
Examples
iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put_many(["a", "b", "c"])
iex> ExDataSketch.Bloom.member?(bloom, "a")
true
Returns a 2-arity reducer function for use with Enum.reduce/3.
Examples
iex> is_function(ExDataSketch.Bloom.reducer(), 2)
true
Serializes the filter to the ExDataSketch-native EXSK binary format.
Examples
iex> bloom = ExDataSketch.Bloom.new(capacity: 100)
iex> binary = ExDataSketch.Bloom.serialize(bloom)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
@spec size_bytes(t()) :: non_neg_integer()
Returns the byte size of the state binary.
Examples
iex> bloom = ExDataSketch.Bloom.new(capacity: 100)
iex> ExDataSketch.Bloom.size_bytes(bloom) > 0
true