# `ExDataSketch.Bloom`
[🔗](https://github.com/thanos/ex_data_sketch/blob/main/lib/ex_data_sketch/bloom.ex#L1)

Bloom filter for probabilistic membership testing.

A Bloom filter is a space-efficient probabilistic data structure that tests
whether an element is a member of a set. False positives are possible, but
false negatives are not: if `member?/2` returns `false`, the item was
definitely not inserted; if it returns `true`, the item was probably inserted.

## Parameters

- `:capacity` -- expected number of elements (default: 10,000). Used to
  derive the optimal bitset size.
- `:false_positive_rate` -- target false positive rate (default: 0.01).
  Must be between 0 and 1 exclusive.
- `:seed` -- hash seed (default: 0). Filters with different seeds are
  incompatible for merge.

The optimal `bit_count` and `hash_count` are derived automatically:

    bit_count  = ceil(-capacity * ln(fpr) / ln(2)^2)
    hash_count = max(1, round(bit_count / capacity * ln(2)))

## Hash Strategy

Items are hashed via `ExDataSketch.Hash.hash64/1` at the API boundary.
The 64-bit hash is split into two 32-bit halves for double hashing
(Kirsch-Mitzenmacher optimization):

    h1 = hash >>> 32
    h2 = hash &&& 0xFFFFFFFF
    position_i = rem(h1 + i * h2, bit_count)

## Binary State Layout (BLM1)

See `plans/adr/ADR-001-bloom-binary-format.md` for the full specification.
40-byte header followed by a packed bitset (LSB-first byte order).

## Merge Properties

Bloom merge is **commutative** and **associative** (bitwise OR). Two filters
can merge only if they have identical `bit_count`, `hash_count`, and `seed`.

# `t`

```elixir
@type t() :: %ExDataSketch.Bloom{backend: module(), opts: keyword(), state: binary()}
```

# `capabilities`

Returns the set of capabilities supported by Bloom filter.

# `capacity`

```elixir
@spec capacity(t()) :: pos_integer()
```

Returns the configured capacity (expected number of elements).

## Examples

    iex> ExDataSketch.Bloom.new(capacity: 5000) |> ExDataSketch.Bloom.capacity()
    5000

# `compatible_with?`

```elixir
@spec compatible_with?(t(), t()) :: boolean()
```

Returns `true` if two Bloom filters have compatible parameters.

## Examples

    iex> a = ExDataSketch.Bloom.new(capacity: 100)
    iex> b = ExDataSketch.Bloom.new(capacity: 100)
    iex> ExDataSketch.Bloom.compatible_with?(a, b)
    true

# `count`

```elixir
@spec count(t()) :: non_neg_integer()
```

Returns the number of set bits (popcount) in the bitset.

This is NOT the number of inserted elements. Useful for computing fill ratio.

## Examples

    iex> ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.count()
    0

# `deserialize`

```elixir
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
```

Deserializes an EXSK binary into a Bloom filter.

Returns `{:ok, bloom}` on success or `{:error, reason}` on failure.

## Examples

    iex> bloom = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("test")
    iex> {:ok, recovered} = ExDataSketch.Bloom.deserialize(ExDataSketch.Bloom.serialize(bloom))
    iex> ExDataSketch.Bloom.member?(recovered, "test")
    true

# `error_rate`

```elixir
@spec error_rate(t()) :: float()
```

Returns the configured target false positive rate.

## Examples

    iex> ExDataSketch.Bloom.new(false_positive_rate: 0.05) |> ExDataSketch.Bloom.error_rate()
    0.05

# `from_enumerable`

```elixir
@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()
```

Creates a Bloom filter from an enumerable of items.

Equivalent to `new(opts) |> put_many(enumerable)`.

## Examples

    iex> bloom = ExDataSketch.Bloom.from_enumerable(["a", "b", "c"], capacity: 100)
    iex> ExDataSketch.Bloom.member?(bloom, "a")
    true

# `member?`

```elixir
@spec member?(t(), term()) :: boolean()
```

Tests whether an item may be a member of the set.

Returns `true` if the item is possibly in the set (may be a false positive),
`false` if the item is definitely not in the set.

## Examples

    iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put("hello")
    iex> ExDataSketch.Bloom.member?(bloom, "hello")
    true

    iex> bloom = ExDataSketch.Bloom.new()
    iex> ExDataSketch.Bloom.member?(bloom, "hello")
    false

# `merge`

```elixir
@spec merge(t(), t()) :: t()
```

Merges two Bloom filters via bitwise OR.

Both filters must have identical `bit_count`, `hash_count`, and `seed`.
Raises `ExDataSketch.Errors.IncompatibleSketchesError` if parameters differ.

## Examples

    iex> a = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("x")
    iex> b = ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("y")
    iex> merged = ExDataSketch.Bloom.merge(a, b)
    iex> ExDataSketch.Bloom.member?(merged, "x") and ExDataSketch.Bloom.member?(merged, "y")
    true

# `merge_many`

```elixir
@spec merge_many(Enumerable.t()) :: t()
```

Merges a non-empty enumerable of Bloom filters into one.

Raises `Enum.EmptyError` if the enumerable is empty.

## Examples

    iex> filters = Enum.map(1..3, fn i ->
    ...>   ExDataSketch.Bloom.new(capacity: 100) |> ExDataSketch.Bloom.put("item_#{i}")
    ...> end)
    iex> merged = ExDataSketch.Bloom.merge_many(filters)
    iex> ExDataSketch.Bloom.member?(merged, "item_1")
    true

# `merger`

```elixir
@spec merger(keyword()) :: (t(), t() -&gt; t())
```

Returns a 2-arity merge function for combining filters.

## Examples

    iex> is_function(ExDataSketch.Bloom.merger(), 2)
    true

# `new`

```elixir
@spec new(keyword()) :: t()
```

Creates a new empty Bloom filter.

## Options

- `:capacity` -- expected number of elements (default: 10000).
- `:false_positive_rate` -- target FPR (default: 0.01).
- `:seed` -- hash seed (default: 0).
- `:backend` -- backend module (default: `ExDataSketch.Backend.Pure`).
- `:hash_fn` -- custom hash function `(term -> non_neg_integer)`.

## Examples

    iex> bloom = ExDataSketch.Bloom.new()
    iex> bloom.opts[:capacity]
    10000

    iex> bloom = ExDataSketch.Bloom.new(capacity: 1000, false_positive_rate: 0.001)
    iex> bloom.opts[:capacity]
    1000

# `put`

```elixir
@spec put(t(), term()) :: t()
```

Inserts a single item into the filter.

The item is hashed via `ExDataSketch.Hash.hash64/1` before insertion.

## Examples

    iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put("hello")
    iex> ExDataSketch.Bloom.member?(bloom, "hello")
    true

# `put_many`

```elixir
@spec put_many(t(), Enumerable.t()) :: t()
```

Inserts multiple items in a single pass.

More efficient than calling `put/2` repeatedly because it minimizes
intermediate binary allocations.

## Examples

    iex> bloom = ExDataSketch.Bloom.new() |> ExDataSketch.Bloom.put_many(["a", "b", "c"])
    iex> ExDataSketch.Bloom.member?(bloom, "a")
    true

# `reducer`

```elixir
@spec reducer() :: (term(), t() -&gt; t())
```

Returns a 2-arity reducer function for use with `Enum.reduce/3`.

## Examples

    iex> is_function(ExDataSketch.Bloom.reducer(), 2)
    true

# `serialize`

```elixir
@spec serialize(t()) :: binary()
```

Serializes the filter to the ExDataSketch-native EXSK binary format.

## Examples

    iex> bloom = ExDataSketch.Bloom.new(capacity: 100)
    iex> binary = ExDataSketch.Bloom.serialize(bloom)
    iex> <<"EXSK", _rest::binary>> = binary
    iex> byte_size(binary) > 0
    true

# `size_bytes`

```elixir
@spec size_bytes(t()) :: non_neg_integer()
```

Returns the byte size of the state binary.

## Examples

    iex> bloom = ExDataSketch.Bloom.new(capacity: 100)
    iex> ExDataSketch.Bloom.size_bytes(bloom) > 0
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
