View Source Talan.BloomFilter (talan v0.1.2)
Bloom filter implementation with concurrent accessibility, powered by :atomics module.
"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set"
credit
Credit
Partly inspired by Blex
features
Features
- Fixed size Bloom filter
- Concurrent reads & writes
- Custom & default hash functions
- Merge multiple Bloom filters into one
- Intersect multiple Bloom filters into one
- Estimate number of unique elements
- Estimate current false positive probability
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.member?("Barna")
true
iex> b |> Talan.BloomFilter.member?("Kovacs")
false
Link to this section Summary
Functions
Returns a map representing the bit state of the atomics_ref
.
Returns an non negative integer representing the estimated cardinality count of unique elements in the filter.
Returns a float representing current estimated false positivity probability.
Hashes term
with all hash_functions
of %Talan.BloomFilter{}
.
Intersection of %Talan.BloomFilter{}
structs's atomics into one new struct.
Checks for membership of term
in bloom_filter
.
Merge multiple %Talan.BloomFilter{}
structs's atomics into one new struct.
Returns a new %Talan.BloomFilter{}
for the desired cardinality
.
Puts term
into bloom_filter
a %Talan.BloomFilter{}
struct.
Returns the required bit count given
Returns count of required hash functions for the
given false_positive_probability
.
Link to this section Types
@type t() :: %Talan.BloomFilter{ atomics_ref: reference(), filter_length: non_neg_integer(), hash_functions: list() }
Link to this section Functions
Returns a map representing the bit state of the atomics_ref
.
Use this for debugging purposes.
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.bits_info()
%{total_bits: 9536, set_bits_count: 0, set_ratio: 0.0}
@spec cardinality(t()) :: non_neg_integer()
Returns an non negative integer representing the estimated cardinality count of unique elements in the filter.
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.cardinality()
0
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.cardinality()
1
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.cardinality()
1
iex> b |> Talan.BloomFilter.put("Kovacs")
iex> b |> Talan.BloomFilter.cardinality()
2
Returns a float representing current estimated false positivity probability.
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.false_positive_probability()
0.0 # fpp zero when bloom filter is empty
iex> b |> Talan.BloomFilter.put("Barna") # fpp increase
iex> b |> Talan.BloomFilter.put("Kovacs")
iex> fpp = b |> Talan.BloomFilter.false_positive_probability()
iex> fpp > 0 && fpp < 1
true
Hashes term
with all hash_functions
of %Talan.BloomFilter{}
.
Returns a list of hashed values.
examples
Examples
b = Talan.BloomFilter.new(1000)
Talan.BloomFilter.hash_term(b, :any_term_can_be_hashed)
[9386, 8954, 8645, 4068, 5445, 6914, 2844]
Intersection of %Talan.BloomFilter{}
structs's atomics into one new struct.
Note: To work correctly filters with identical size & hash functions must be used.
Returns a new %BloomFilter{}
struct which set bits are the intersection
the bloom filters in the list
.
examples
Examples
iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b1 |> Talan.BloomFilter.put("GitHub")
iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b2 |> Talan.BloomFilter.put("GitHub")
iex> b2 |> Talan.BloomFilter.put("Octocat")
:ok
iex> b3 = Talan.BloomFilter.intersection([b1, b2])
iex> b3 |> Talan.BloomFilter.member?("GitHub")
true
iex> b3 |> Talan.BloomFilter.member?("Octocat")
false
Checks for membership of term
in bloom_filter
.
Returns false
if not a member. (definitely not member)
Returns true
if maybe a member. (possibly member)
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
false
iex> b |> Talan.BloomFilter.put("Barna Kovacs")
iex> b |> Talan.BloomFilter.member?("Barna Kovacs")
true
Merge multiple %Talan.BloomFilter{}
structs's atomics into one new struct.
Note: To work correctly filters with identical size & hash functions must be used.
Returns a new %Talan.BloomFilter{}
struct which set bits are the merged set bits of
the bloom filters in the list
.
examples
Examples
iex> hash_functions = Talan.seed_n_murmur_hash_fun(7)
iex> b1 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b1 |> Talan.BloomFilter.put("GitHub")
iex> b2 = Talan.BloomFilter.new(1000, hash_functions: hash_functions)
iex> b2 |> Talan.BloomFilter.put("Octocat")
:ok
iex> b3 = Talan.BloomFilter.merge([b1, b2])
iex> b3 |> Talan.BloomFilter.member?("GitHub")
true
iex> b3 |> Talan.BloomFilter.member?("Octocat")
true
@spec new(pos_integer(), list()) :: t()
Returns a new %Talan.BloomFilter{}
for the desired cardinality
.
cardinality
is the expected number of unique items. Duplicated items
can be infinite.
options
Options
:false_positive_probability
- a float, defaults to 0.01:hash_functions
- a list of hash functions, defaults to randomly seeded murmur
examples
Examples
iex> bloom_filter = Talan.BloomFilter.new(1_000_000)
iex> bloom_filter |> Talan.BloomFilter.put("Barna Kovacs")
:ok
Puts term
into bloom_filter
a %Talan.BloomFilter{}
struct.
After this the member?
function will always return true
for the membership of term
.
Returns :ok
.
examples
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Chris McCord")
:ok
iex> b |> Talan.BloomFilter.put("Jose Valim")
:ok
@spec required_filter_length(non_neg_integer(), float()) :: non_neg_integer()
Returns the required bit count given
cardinality
- Number of unique elements that will be insertedfalse_positive_probability
- Desired false positive probability of membership
Wikipedia - Bloom filter - Optimal number of hash functions
examples
Examples
iex> Talan.BloomFilter.required_filter_length(10_000, 0.01)
95851
@spec required_hash_function_count(float()) :: non_neg_integer()
Returns count of required hash functions for the
given false_positive_probability
.
Wikipedia - Bloom filter - Optimal number of hash functions
examples
Examples
iex> Talan.BloomFilter.required_hash_function_count(0.01)
7
iex> Talan.BloomFilter.required_hash_function_count(0.001)
10
iex> Talan.BloomFilter.required_hash_function_count(0.0001)
14