Blex v0.2.1 Blex View Source
Blex is a fast Bloom filter with concurrent accessibility, powered by :atomics
module.
Features
- Fixed size Bloom filter
- Concurrent reads & writes
- Serialization
- Merge multiple Bloom filters into one
- Only one copy of data because data is saved in either
:atomics
or binary (if > 64 bytes) - Custom hash functions
Example
iex> b = Blex.new(1000, 0.01)
iex> Task.async(fn -> Blex.put(b, "hello") end) |> Task.await()
iex> Task.async(fn -> Blex.put(b, "world") end) |> Task.await()
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "world")
true
iex> Blex.member?(b, "others")
false
Blex struct and Blex binary
Blex struct is the struct that contains Blex info, created via Blex.new/2
, Blex.new/3
,
Blex.decode/1
and Blex.merge/1
. Data is saved in atomics array.
Blex binary is encoded binary from Blex struct via Blex.encode/1
or Blex.merge_encode/1
.
It supports most operations (e.g. Blex.member?/2
) except Blex.put/2
(obviously, we cannot mutate binary).
This is useful when we collect Bloom filters from other nodes, we can avoid deserialization
if we do not need to add more memebers to it.
How to start ?
Checkout Blex.new/2
, Blex.put/2
and Blex.member?/2
.
How to do serialization ?
Checkout Blex.encode/1
, Blex.decode/1
, Blex.merge/1
and Blex.merge_encode/1
.
How to merge mutliple bloom filters ?
Checkout Blex.merge/1
and Blex.merge_encode/1
.
How to use custom hash functions ?
Checkout Blex.register/2
and Blex.new/3
.
How to collect meta info ?
Checkout Blex.estimate_size/1
, Blex.estimate_memory/1
and Blex.estimate_capacity/1
.
Link to this section Summary
Functions
Decode Blex binary to Blex struct
Encode Blex struct to Blex binary
Estimate actual capacity of Blex struct or Blex binary
Estimate memory consumption in bytes for Blex struct or Blex binary
Estimate actual size of unique items that Blex struct or Blex binary contains
Check if item is member of Blex struct or Blex binary
Merge multiple Blex struct or Blex binary into one Blex struct
Merge multiple Blex struct or Blex binary into one Blex binary
Merge multiple Blex struct or Blex binary into given Blex struct
Create a Bloom filter with default hash function. It returns a Blex struct
Create a Bloom filter with custom hash id. It returns a Blex struct
Put item into Bloom filter (Blex struct)
Register a custom function with given id
Link to this section Types
hash_function()
View Source
hash_function() :: (non_neg_integer(), any() -> {non_neg_integer(), any()})
hash_function() :: (non_neg_integer(), any() -> {non_neg_integer(), any()})
hash_id()
View Source
hash_id() :: non_neg_integer()
hash_id() :: non_neg_integer()
t()
View Source
t() :: %Blex{
a: :atomics.atomics_ref(),
b: pos_integer(),
hash_fn: hash_function(),
hash_id: hash_id(),
k: pos_integer(),
m: pos_integer()
}
t() :: %Blex{ a: :atomics.atomics_ref(), b: pos_integer(), hash_fn: hash_function(), hash_id: hash_id(), k: pos_integer(), m: pos_integer() }
Link to this section Functions
decode(blex_binary) View Source
Decode Blex binary to Blex struct.
Example
iex> b = Blex.new(40, 0.5)
iex> Blex.put(b, "hello")
:ok
iex> encoded = Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 1, 0, 0, 0>>
iex> decoded = Blex.decode(encoded)
iex> Blex.member?(decoded, "hello")
true
encode(blex_struct) View Source
Encode Blex struct to Blex binary.
Example
iex> b = Blex.new(40, 0.5)
iex> Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 0, 0, 0, 0>>
iex> Blex.put(b, "hello")
iex> Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 1, 0, 0, 0>>
estimate_capacity(blex)
View Source
estimate_capacity(t() | binary()) :: non_neg_integer()
estimate_capacity(t() | binary()) :: non_neg_integer()
Estimate actual capacity of Blex struct or Blex binary.
Capacity grows in power of 2. Sometimes, the actual capacity
could be bigger than specified capacity in Blex.new/2
and Blex.new/3
.
It's estimated value and it could be slightly smaller than specified capacity.
Example
iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_capacity(b)
1419
iex> encoded = Blex.encode(b)
iex> Blex.estimate_capacity(encoded)
1419
estimate_memory(blex)
View Source
estimate_memory(t() | binary()) :: non_neg_integer()
estimate_memory(t() | binary()) :: non_neg_integer()
Estimate memory consumption in bytes for Blex struct or Blex binary.
Example
iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_memory(b)
1832
iex> encoded = Blex.encode(b)
iex> Blex.estimate_memory(encoded)
1795
estimate_size(blex)
View Source
estimate_size(t() | binary()) :: non_neg_integer()
estimate_size(t() | binary()) :: non_neg_integer()
Estimate actual size of unique items that Blex struct or Blex binary contains.
Example
iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_size(b)
0
iex> Blex.put(b, "hello")
:ok
iex> Blex.estimate_size(b)
1
iex> Blex.put(b, "world")
:ok
iex> Blex.estimate_size(b)
2
iex> encoded = Blex.encode(b)
iex> Blex.estimate_size(encoded)
2
member?(blex, item) View Source
Check if item is member of Blex struct or Blex binary.
Example
iex> b = Blex.new(1000, 0.01)
iex> Blex.member?(b, "hello")
false
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> encoded = Blex.encode(b)
iex> Blex.member?(encoded, "hello")
true
merge(list) View Source
Merge multiple Blex struct or Blex binary into one Blex struct.
Example
iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> merged = Blex.merge([b1, b2, encoded_b3])
iex> Blex.member?(merged, "hello")
true
iex> Blex.member?(merged, "world")
true
iex> Blex.member?(merged, "okk")
true
iex> Blex.member?(merged, "others")
false
merge_encode(list) View Source
Merge multiple Blex struct or Blex binary into one Blex binary.
It does list |> Blex.merge() |> Blex.encode()
without intermeidate step.
Example
iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> merged = Blex.merge_encode([b1, b2, encoded_b3])
iex> is_binary(merged)
true
iex> Blex.member?(merged, "hello")
true
iex> Blex.member?(merged, "world")
true
iex> Blex.member?(merged, "okk")
true
iex> Blex.member?(merged, "others")
false
merge_into(blexes, blex_struct) View Source (since 0.2.0)
Merge multiple Blex struct or Blex binary into given Blex struct.
Example
iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> Blex.merge_into([b2, encoded_b3], b1)
:ok
iex> Blex.member?(b1, "hello")
true
iex> Blex.member?(b1, "world")
true
iex> Blex.member?(b1, "okk")
true
iex> Blex.member?(b1, "others")
false
new(capacity, false_positive_probability)
View Source
new(pos_integer(), float()) :: t()
new(pos_integer(), float()) :: t()
Create a Bloom filter with default hash function. It returns a Blex struct.
capacity
should be a positive integer.
false_positive_probability
should be a float that greater than 0 and smaller than 1.
Example
To create a Bloom filter with 1000 capacity and 1% false positive probability, we can do:
iex> b = Blex.new(1000, 0.01)
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "others")
false
new(capacity, false_positive_probability, custom_hash_id)
View Source
new(pos_integer(), float(), hash_id()) :: t()
new(pos_integer(), float(), hash_id()) :: t()
Create a Bloom filter with custom hash id. It returns a Blex struct.
capacity
should be a positive integer.
false_positive_probability
should be a float that greater than 0 and smaller than 1.
Before we use a custom hash id, we need to do Blex.register/2
to register it first.
Example
To create a Bloom filter with custom hash function, 1000 capacity and 1% false positive probability, we can do:
iex> custom_hash_id = 1
iex> Blex.register(custom_hash_id, fn
...> 0, {item, b, range} ->
...> <<h1::size(b), h2::size(b), _::bits>> = <<Murmur.hash_x86_128(item)::128>>
...> {h1, {range, h1, h2}}
...>
...> i, {range, h1, h2} = acc ->
...> {rem(h1 + i * h2, range), acc}
...> end)
:ok
iex> b = Blex.new(1000, 0.01, custom_hash_id)
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "others")
false
put(blex_struct, item) View Source
Put item into Bloom filter (Blex struct).
Example
iex> b = Blex.new(1000, 0.01)
iex> Blex.member?(b, "hello")
false
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
register(custom_hash_id, hash_function)
View Source
register(hash_id(), hash_function()) :: :ok
register(hash_id(), hash_function()) :: :ok
Register a custom function with given id.
Custom hash id must be integer and within range from 0 to 200. So we can have max 201 custom hash functions. Adding more custom hash functions is possible but not supported yet because 201 custom functions should be enough in practice.
The signature of hash function is similar to fun
from Enum.map_reduce/3
.
The spec of hash function is (non_neg_integer(), any() -> {non_neg_integer(), any()})
.
The hash function would be invoked k time if Bloom filter has k hash functions.
The first parameter is integer from 0
to k-1
.
The second parameter is the accumulator. The first accumulator is {item, b, range}
where
item
is the value to hash. range
indicates that returned position should be in
range from 0
to range-1
. b
is bit size of range and we have (1 <<< b) == range
.
The returned value is a tuple of two element. The first element is the position of the bit.
The second element is the accumulator that would be passed to next interation.
The hash id and hash function pair would be saved in :persistent_term
. We should only
register it once at the beginning.
Example
iex> custom_hash_id = 1
iex> Blex.register(custom_hash_id, fn
...> 0, {item, b, range} ->
...> <<h1::size(b), h2::size(b), _::bits>> = <<Murmur.hash_x86_128(item)::128>>
...> {h1, {range, h1, h2}}
...>
...> i, {range, h1, h2} = acc ->
...> {rem(h1 + i * h2, range), acc}
...> end)
:ok