talan v0.1.1 Talan.BloomFilter View Source

Bloom filter implementation with concurrent accessibility, powered by :atomics module.

"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set"

Bloom filter on Wikipedia

Credit

Partly inspired by Blex

Features

  • Fixed size Bloom filter
  • Concurrent reads & writes
  • Custom & default hash functions
  • Merge multiple Bloom filters into one
  • Intersect multiple Bloom filters into one
  • Estimate number of unique elements
  • Estimate current false positive probability

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.member?("Barna")
true
iex> b |> Talan.BloomFilter.member?("Kovacs")
false

Link to this section Summary

Functions

Returns a map representing the bit state of the atomics_ref.

Returns an non negative integer representing the estimated cardinality count of unique elements in the filter.

Returns a float representing current estimated false positivity probability.

Hashes term with all hash_functions of %Talan.BloomFilter{}.

Intersection of %Talan.BloomFilter{} structs's atomics into one new struct.

Checks for membership of term in bloom_filter.

Merge multiple %Talan.BloomFilter{} structs's atomics into one new struct.

Returns a new %Talan.BloomFilter{} for the desired cardinality.

Puts term into bloom_filter a %Talan.BloomFilter{} struct.

Returns count of required hash functions for the given false_positive_probability.

Link to this section Types

Link to this type

t()

View Source
t() :: %Talan.BloomFilter{
  atomics_ref: reference(),
  filter_length: non_neg_integer(),
  hash_functions: list()
}

Link to this section Functions

Link to this function

bits_info(bloom_filter)

View Source
bits_info(t()) :: map()

Returns a map representing the bit state of the atomics_ref.

Use this for debugging purposes.

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.bits_info()
%{total_bits: 9536, set_bits_count: 0, set_ratio: 0.0}
Link to this function

cardinality(bloom_filter)

View Source
cardinality(t()) :: non_neg_integer()

Returns an non negative integer representing the estimated cardinality count of unique elements in the filter.

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.cardinality()
0
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.cardinality()
1
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.cardinality()
1
iex> b |> Talan.BloomFilter.put("Kovacs")
iex> b |> Talan.BloomFilter.cardinality()
2
Link to this function

false_positive_probability(bloom_filter)

View Source
false_positive_probability(t()) :: float()

Returns a float representing current estimated false positivity probability.

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.false_positive_probability()
0.0 # fpp zero when bloom filter is empty
iex> b |> Talan.BloomFilter.put("Barna") # fpp increase
iex> b |> Talan.BloomFilter.put("Kovacs")
iex> fpp = b |> Talan.BloomFilter.false_positive_probability()
iex> fpp > 0 && fpp < 1
true
Link to this function

hash_term(bloom_filter, term)

View Source
hash_term(t(), any()) :: [integer()]

Hashes term with all hash_functions of %Talan.BloomFilter{}.

Returns a list of hashed values.

Examples

b = Talan.BloomFilter.new(1000)
Talan.BloomFilter.hash_term(b, :any_term_can_be_hashed)
[9386, 8954, 8645, 4068, 5445, 6914, 2844]
Link to this function

intersection(list)

View Source
intersection([t(), ...]) :: t()

Intersection of %Talan.BloomFilter{} structs's atomics into one new struct.

Note: To work correctly filters with identical size & hash functions must be used.

Returns a new %BloomFilter{} struct which set bits are the intersection the bloom filters in the list.

Examples

iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b1 |> Talan.BloomFilter.put("GitHub")
iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b2 |> Talan.BloomFilter.put("GitHub")
iex> b2 |> Talan.BloomFilter.put("Octocat")
:ok
iex> b3 = Talan.BloomFilter.intersection([b1, b2])
iex> b3 |> Talan.BloomFilter.member?("GitHub")
true
iex> b3 |> Talan.BloomFilter.member?("Octocat")
false
Link to this function

member?(bloom_filter, term)

View Source
member?(t(), any()) :: boolean()

Checks for membership of term in bloom_filter.

Returns false if not a member. (definitely not member) Returns true if maybe a member. (possibly member)

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
false
iex> b |> Talan.BloomFilter.put("Barna Kovacs")
iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
true
Link to this function

merge(list)

View Source
merge([t(), ...]) :: t()

Merge multiple %Talan.BloomFilter{} structs's atomics into one new struct.

Note: To work correctly filters with identical size & hash functions must be used.

Returns a new %Talan.BloomFilter{} struct which set bits are the merged set bits of the bloom filters in the list.

Examples

iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b1 |> Talan.BloomFilter.put("GitHub")
iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b2 |> Talan.BloomFilter.put("Octocat")
:ok
iex> b3 = Talan.BloomFilter.merge([b1, b2])
iex> b3 |> Talan.BloomFilter.member?("GitHub")
true
iex> b3 |> Talan.BloomFilter.member?("Octocat")
true
Link to this function

new(cardinality, options \\ [])

View Source
new(pos_integer(), list()) :: t()

Returns a new %Talan.BloomFilter{} for the desired cardinality.

cardinality is the expected number of unique items. Duplicated items can be infinite.

Options

  • :false_positive_probability - a float, defaults to 0.01
  • :hash_functions - a list of hash functions, defaults to randomly seeded murmur

Examples

iex> bloom_filter = Talan.BloomFilter.new(1_000_000)
iex> bloom_filter |> Talan.BloomFilter.put("Barna Kovacs")
:ok
Link to this function

put(bloom_filter, term)

View Source
put(t(), any()) :: :ok

Puts term into bloom_filter a %Talan.BloomFilter{} struct.

After this the member? function will always return true for the membership of term.

Returns :ok.

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Chris McCord")
:ok
iex> b |> Talan.BloomFilter.put("Jose Valim")
:ok
Link to this function

required_filter_length(cardinality, false_positive_probability)

View Source
required_filter_length(non_neg_integer(), float()) :: non_neg_integer()

Returns the required bit count given

  • cardinality - Number of unique elements that will be inserted
  • false_positive_probability - Desired false positive probability of membership

Wikipedia - Bloom filter - Optimal number of hash functions

Examples

iex> Talan.BloomFilter.required_filter_length(10_000, 0.01)
95851
Link to this function

required_hash_function_count(false_positive_probability)

View Source
required_hash_function_count(float()) :: non_neg_integer()

Returns count of required hash functions for the given false_positive_probability.

Wikipedia - Bloom filter - Optimal number of hash functions

Examples

iex> Talan.BloomFilter.required_hash_function_count(0.01)
7
iex> Talan.BloomFilter.required_hash_function_count(0.001)
10
iex> Talan.BloomFilter.required_hash_function_count(0.0001)
14