Blex v0.2.1 Blex View Source

Blex is a fast Bloom filter with concurrent accessibility, powered by :atomics module.

Features

  • Fixed size Bloom filter
  • Concurrent reads & writes
  • Serialization
  • Merge multiple Bloom filters into one
  • Only one copy of data because data is saved in either :atomics or binary (if > 64 bytes)
  • Custom hash functions

Example

iex> b = Blex.new(1000, 0.01)
iex> Task.async(fn -> Blex.put(b, "hello") end) |> Task.await()
iex> Task.async(fn -> Blex.put(b, "world") end) |> Task.await()
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "world")
true
iex> Blex.member?(b, "others")
false

Blex struct and Blex binary

Blex struct is the struct that contains Blex info, created via Blex.new/2, Blex.new/3, Blex.decode/1 and Blex.merge/1. Data is saved in atomics array.

Blex binary is encoded binary from Blex struct via Blex.encode/1 or Blex.merge_encode/1. It supports most operations (e.g. Blex.member?/2) except Blex.put/2 (obviously, we cannot mutate binary). This is useful when we collect Bloom filters from other nodes, we can avoid deserialization if we do not need to add more memebers to it.

How to start ?

Checkout Blex.new/2, Blex.put/2 and Blex.member?/2.

How to do serialization ?

Checkout Blex.encode/1, Blex.decode/1, Blex.merge/1 and Blex.merge_encode/1.

How to merge mutliple bloom filters ?

Checkout Blex.merge/1 and Blex.merge_encode/1.

How to use custom hash functions ?

Checkout Blex.register/2 and Blex.new/3.

How to collect meta info ?

Checkout Blex.estimate_size/1, Blex.estimate_memory/1 and Blex.estimate_capacity/1.

Link to this section Summary

Functions

Decode Blex binary to Blex struct

Encode Blex struct to Blex binary

Estimate actual capacity of Blex struct or Blex binary

Estimate memory consumption in bytes for Blex struct or Blex binary

Estimate actual size of unique items that Blex struct or Blex binary contains

Check if item is member of Blex struct or Blex binary

Merge multiple Blex struct or Blex binary into one Blex struct

Merge multiple Blex struct or Blex binary into one Blex binary

Merge multiple Blex struct or Blex binary into given Blex struct

Create a Bloom filter with default hash function. It returns a Blex struct

Create a Bloom filter with custom hash id. It returns a Blex struct

Put item into Bloom filter (Blex struct)

Register a custom function with given id

Link to this section Types

Link to this type

hash_function() View Source
hash_function() :: (non_neg_integer(), any() -> {non_neg_integer(), any()})

Link to this type

t() View Source
t() :: %Blex{
  a: :atomics.atomics_ref(),
  b: pos_integer(),
  hash_fn: hash_function(),
  hash_id: hash_id(),
  k: pos_integer(),
  m: pos_integer()
}

Link to this section Functions

Link to this function

decode(blex_binary) View Source
decode(binary()) :: t()

Decode Blex binary to Blex struct.

Example

iex> b = Blex.new(40, 0.5)
iex> Blex.put(b, "hello")
:ok
iex> encoded = Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 1, 0, 0, 0>>
iex> decoded = Blex.decode(encoded)
iex> Blex.member?(decoded, "hello")
true
Link to this function

encode(blex_struct) View Source
encode(t()) :: binary()

Encode Blex struct to Blex binary.

Example

iex> b = Blex.new(40, 0.5)
iex> Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 0, 0, 0, 0>>
iex> Blex.put(b, "hello")
iex> Blex.encode(b)
<<201, 1, 6, 0, 0, 0, 0, 1, 0, 0, 0>>
Link to this function

estimate_capacity(blex) View Source
estimate_capacity(t() | binary()) :: non_neg_integer()

Estimate actual capacity of Blex struct or Blex binary.

Capacity grows in power of 2. Sometimes, the actual capacity could be bigger than specified capacity in Blex.new/2 and Blex.new/3.

It's estimated value and it could be slightly smaller than specified capacity.

Example

iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_capacity(b)
1419
iex> encoded = Blex.encode(b)
iex> Blex.estimate_capacity(encoded)
1419
Link to this function

estimate_memory(blex) View Source
estimate_memory(t() | binary()) :: non_neg_integer()

Estimate memory consumption in bytes for Blex struct or Blex binary.

Example

iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_memory(b)
1832
iex> encoded = Blex.encode(b)
iex> Blex.estimate_memory(encoded)
1795
Link to this function

estimate_size(blex) View Source
estimate_size(t() | binary()) :: non_neg_integer()

Estimate actual size of unique items that Blex struct or Blex binary contains.

Example

iex> b = Blex.new(1000, 0.01)
iex> Blex.estimate_size(b)
0
iex> Blex.put(b, "hello")
:ok
iex> Blex.estimate_size(b)
1
iex> Blex.put(b, "world")
:ok
iex> Blex.estimate_size(b)
2
iex> encoded = Blex.encode(b)
iex> Blex.estimate_size(encoded)
2
Link to this function

member?(blex, item) View Source
member?(t() | binary(), any()) :: boolean()

Check if item is member of Blex struct or Blex binary.

Example

iex> b = Blex.new(1000, 0.01)
iex> Blex.member?(b, "hello")
false
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> encoded = Blex.encode(b)
iex> Blex.member?(encoded, "hello")
true
Link to this function

merge(list) View Source
merge([t() | binary()]) :: t()

Merge multiple Blex struct or Blex binary into one Blex struct.

Example

iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> merged = Blex.merge([b1, b2, encoded_b3])
iex> Blex.member?(merged, "hello")
true
iex> Blex.member?(merged, "world")
true
iex> Blex.member?(merged, "okk")
true
iex> Blex.member?(merged, "others")
false
Link to this function

merge_encode(list) View Source
merge_encode([t() | binary()]) :: binary()

Merge multiple Blex struct or Blex binary into one Blex binary.

It does list |> Blex.merge() |> Blex.encode() without intermeidate step.

Example

iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> merged = Blex.merge_encode([b1, b2, encoded_b3])
iex> is_binary(merged)
true
iex> Blex.member?(merged, "hello")
true
iex> Blex.member?(merged, "world")
true
iex> Blex.member?(merged, "okk")
true
iex> Blex.member?(merged, "others")
false
Link to this function

merge_into(blexes, blex_struct) View Source (since 0.2.0)
merge_into([t() | binary()], t()) :: :ok

Merge multiple Blex struct or Blex binary into given Blex struct.

Example

iex> b1 = Blex.new(1000, 0.01)
iex> b2 = Blex.new(1000, 0.01)
iex> b3 = Blex.new(1000, 0.01)
iex> Blex.put(b1, "hello")
:ok
iex> Blex.put(b2, "world")
:ok
iex> Blex.put(b3, "okk")
:ok
iex> encoded_b3 = Blex.encode(b3)
iex> Blex.merge_into([b2, encoded_b3], b1)
:ok
iex> Blex.member?(b1, "hello")
true
iex> Blex.member?(b1, "world")
true
iex> Blex.member?(b1, "okk")
true
iex> Blex.member?(b1, "others")
false
Link to this function

new(capacity, false_positive_probability) View Source
new(pos_integer(), float()) :: t()

Create a Bloom filter with default hash function. It returns a Blex struct.

capacity should be a positive integer.

false_positive_probability should be a float that greater than 0 and smaller than 1.

Example

To create a Bloom filter with 1000 capacity and 1% false positive probability, we can do:

iex> b = Blex.new(1000, 0.01)
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "others")
false
Link to this function

new(capacity, false_positive_probability, custom_hash_id) View Source
new(pos_integer(), float(), hash_id()) :: t()

Create a Bloom filter with custom hash id. It returns a Blex struct.

capacity should be a positive integer.

false_positive_probability should be a float that greater than 0 and smaller than 1.

Before we use a custom hash id, we need to do Blex.register/2 to register it first.

Example

To create a Bloom filter with custom hash function, 1000 capacity and 1% false positive probability, we can do:

iex> custom_hash_id = 1
iex> Blex.register(custom_hash_id, fn
...>   0, {item, b, range} ->
...>     <<h1::size(b), h2::size(b), _::bits>> = <<Murmur.hash_x86_128(item)::128>>
...>     {h1, {range, h1, h2}}
...>
...>   i, {range, h1, h2} = acc ->
...>     {rem(h1 + i * h2, range), acc}
...> end)
:ok
iex> b = Blex.new(1000, 0.01, custom_hash_id)
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
iex> Blex.member?(b, "others")
false
Link to this function

put(blex_struct, item) View Source
put(t(), any()) :: :ok

Put item into Bloom filter (Blex struct).

Example

iex> b = Blex.new(1000, 0.01)
iex> Blex.member?(b, "hello")
false
iex> Blex.put(b, "hello")
:ok
iex> Blex.member?(b, "hello")
true
Link to this function

register(custom_hash_id, hash_function) View Source
register(hash_id(), hash_function()) :: :ok

Register a custom function with given id.

Custom hash id must be integer and within range from 0 to 200. So we can have max 201 custom hash functions. Adding more custom hash functions is possible but not supported yet because 201 custom functions should be enough in practice.

The signature of hash function is similar to fun from Enum.map_reduce/3. The spec of hash function is (non_neg_integer(), any() -> {non_neg_integer(), any()}). The hash function would be invoked k time if Bloom filter has k hash functions. The first parameter is integer from 0 to k-1. The second parameter is the accumulator. The first accumulator is {item, b, range} where item is the value to hash. range indicates that returned position should be in range from 0 to range-1. b is bit size of range and we have (1 <<< b) == range. The returned value is a tuple of two element. The first element is the position of the bit. The second element is the accumulator that would be passed to next interation.

The hash id and hash function pair would be saved in :persistent_term. We should only register it once at the beginning.

Example

iex> custom_hash_id = 1
iex> Blex.register(custom_hash_id, fn
...>   0, {item, b, range} ->
...>     <<h1::size(b), h2::size(b), _::bits>> = <<Murmur.hash_x86_128(item)::128>>
...>     {h1, {range, h1, h2}}
...>
...>   i, {range, h1, h2} = acc ->
...>     {rem(h1 + i * h2, range), acc}
...> end)
:ok