# `Talan.BloomFilter`
[🔗](https://github.com/preciz/talan/blob/v0.2.1/lib/talan/bloom_filter.ex#L1)

Bloom filter implementation with **concurrent accessibility**,
powered by [:atomics](http://erlang.org/doc/man/atomics.html) module.

"A Bloom filter is a space-efficient probabilistic data structure,
conceived by Burton Howard Bloom in 1970,
that is used to test whether an element is a member of a set"

[Bloom filter on Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter#CITEREFZhiwangJungangJian2010)

## Credit

Partly inspired by [Blex](https://github.com/gyson/blex)

## Features

  * Fixed size Bloom filter
  * Concurrent reads & writes
  * Custom & default hash functions
  * Merge multiple Bloom filters into one
  * Intersect multiple Bloom filters into one
  * Estimate number of unique elements
  * Estimate current false positive probability

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.put("Barna")
    iex> b |> Talan.BloomFilter.member?("Barna")
    true
    iex> b |> Talan.BloomFilter.member?("Kovacs")
    false

# `t`

```elixir
@type t() :: %Talan.BloomFilter{
  atomics_ref: reference(),
  filter_length: non_neg_integer(),
  hash_functions: list()
}
```

# `bits_info`

```elixir
@spec bits_info(t()) :: map()
```

Returns a map representing the bit state of the `atomics_ref`.

Use this for debugging purposes.

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.bits_info()
    %{total_bits: 9536, set_bits_count: 0, set_ratio: 0.0}

# `cardinality`

```elixir
@spec cardinality(t()) :: non_neg_integer()
```

Returns a non negative integer representing the
estimated cardinality count of unique elements in the filter.

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.cardinality()
    0
    iex> b |> Talan.BloomFilter.put("Barna")
    iex> b |> Talan.BloomFilter.cardinality()
    1
    iex> b |> Talan.BloomFilter.put("Barna")
    iex> b |> Talan.BloomFilter.cardinality()
    1
    iex> b |> Talan.BloomFilter.put("Kovacs")
    iex> b |> Talan.BloomFilter.cardinality()
    2

# `deserialize`
*since 0.1.3* 

```elixir
@spec deserialize(binary()) :: t()
```

Deserializes a binary into a Bloom filter.

This function takes a binary that was previously created by `serialize/1`
and reconstructs the Bloom filter structure.

## Examples

    iex> bloom_filter = Talan.BloomFilter.new(1000)
    iex> serialized = Talan.BloomFilter.serialize(bloom_filter)
    iex> deserialized = Talan.BloomFilter.deserialize(serialized)
    iex> is_struct(deserialized, Talan.BloomFilter)
    true

# `false_positive_probability`

```elixir
@spec false_positive_probability(t()) :: float()
```

Returns a float representing current estimated
false positive probability.

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.false_positive_probability()
    0.0 # fpp zero when bloom filter is empty
    iex> b |> Talan.BloomFilter.put("Barna") # fpp increases
    iex> b |> Talan.BloomFilter.put("Kovacs")
    iex> fpp = b |> Talan.BloomFilter.false_positive_probability()
    iex> fpp > 0 && fpp < 1
    true

# `hash_term`

```elixir
@spec hash_term(t(), any()) :: [integer()]
```

Hashes `term` with all `hash_functions` of `%Talan.BloomFilter{}`.

Returns a list of hashed values.

## Examples

    b = Talan.BloomFilter.new(1000)
    Talan.BloomFilter.hash_term(b, :any_term_can_be_hashed)
    [9386, 8954, 8645, 4068, 5445, 6914, 2844]

# `intersection`

```elixir
@spec intersection([t(), ...]) :: t()
```

Intersection of `%Talan.BloomFilter{}` structs's atomics into one new struct.

Note: To work correctly filters with identical size & hash functions must be used.

Returns a new `%BloomFilter{}` struct which set bits are the intersection
the bloom filters in the `list`.

## Examples

    iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
    iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
    iex> b1 |> Talan.BloomFilter.put("GitHub")
    iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
    iex> b2 |> Talan.BloomFilter.put("GitHub")
    iex> b2 |> Talan.BloomFilter.put("Octocat")
    :ok
    iex> b3 = Talan.BloomFilter.intersection([b1, b2])
    iex> b3 |> Talan.BloomFilter.member?("GitHub")
    true
    iex> b3 |> Talan.BloomFilter.member?("Octocat")
    false

# `member?`

```elixir
@spec member?(t(), any()) :: boolean()
```

Checks for membership of `term` in `bloom_filter`.

Returns `false` if not a member. (definitely not member)
Returns `true` if maybe a member. (possibly member)

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
    false
    iex> b |> Talan.BloomFilter.put("Barna Kovacs")
    iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
    true

# `merge`

```elixir
@spec merge([t(), ...]) :: t()
```

Merge multiple `%Talan.BloomFilter{}` structs's atomics into one new struct.

Note: To work correctly filters with identical size & hash functions must be used.

Returns a new `%Talan.BloomFilter{}` struct which set bits are the merged set bits of
the bloom filters in the `list`.

## Examples

    iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
    iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
    iex> b1 |> Talan.BloomFilter.put("GitHub")
    iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
    iex> b2 |> Talan.BloomFilter.put("Octocat")
    :ok
    iex> b3 = Talan.BloomFilter.merge([b1, b2])
    iex> b3 |> Talan.BloomFilter.member?("GitHub")
    true
    iex> b3 |> Talan.BloomFilter.member?("Octocat")
    true

# `new`

```elixir
@spec new(pos_integer(), list()) :: t()
```

Returns a new `%Talan.BloomFilter{}` for the desired `cardinality`.

`cardinality` is the expected number of unique items. Duplicated items
can be infinite.

## Options
  * `:false_positive_probability` - a float, defaults to 0.01
  * `:hash_functions` - a list of hash functions, defaults to randomly seeded murmur

## Examples

    iex> bloom_filter = Talan.BloomFilter.new(1_000_000)
    iex> bloom_filter |> Talan.BloomFilter.put("Barna Kovacs")
    :ok

# `put`

```elixir
@spec put(t(), any()) :: :ok
```

Puts `term` into `bloom_filter` a `%Talan.BloomFilter{}` struct.

After this the `member?` function will always return `true`
for the membership of `term`.

Returns `:ok`.

## Examples

    iex> b = Talan.BloomFilter.new(1000)
    iex> b |> Talan.BloomFilter.put("Chris McCord")
    :ok
    iex> b |> Talan.BloomFilter.put("Jose Valim")
    :ok

# `required_filter_length`

```elixir
@spec required_filter_length(non_neg_integer(), float()) :: non_neg_integer()
```

Returns the required bit count given

* `cardinality` - Number of unique elements that will be inserted
* `false_positive_probability` - Desired false positive probability of membership

[Wikipedia - Bloom filter - Optimal number of hash functions](https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions)

## Examples

    iex> Talan.BloomFilter.required_filter_length(10_000, 0.01)
    95851

# `required_hash_function_count`

```elixir
@spec required_hash_function_count(float()) :: non_neg_integer()
```

Returns the count of required hash functions for the
given `false_positive_probability`.

[Wikipedia - Bloom filter - Optimal number of hash functions](https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions)

## Examples

    iex> Talan.BloomFilter.required_hash_function_count(0.01)
    7
    iex> Talan.BloomFilter.required_hash_function_count(0.001)
    10
    iex> Talan.BloomFilter.required_hash_function_count(0.0001)
    14

# `serialize`
*since 0.1.3* 

```elixir
@spec serialize(t()) :: binary()
```

Serializes the Bloom filter into a binary.

This function converts the Bloom filter structure into a binary format,
which can be used for storage or transmission.

## Examples

    iex> bloom_filter = Talan.BloomFilter.new(1000)
    iex> serialized = Talan.BloomFilter.serialize(bloom_filter)
    iex> is_binary(serialized)
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
