Flower

This is an implementation of Bloom Filters for Elixir. It uses NIFs written in Rust for better performance, since Bloom Filters rely highly on mutability.

What are Bloom Filter?

TL;DR: Huge amount of data ➜ small Bloom Filter : Was X not in Huge amount of data?

For more about this topic consider, checking out:

Installation

The package can be installed by adding flower to your list of dependencies in mix.exs:

def deps do
  [
    {:flower, "~> 0.1.4"},
  ]
end

Also, you need to have Rust installed for development, since this uses NIFs.

Docs can be found at https://hexdocs.pm/flower.

Documentation can be generated with ExDoc.

Usage

There is also a small walk through in the Example Section

alias Flower.Bloom, as: Bloom

Create new Bloom Filter

expected_number_of_unique_elements = 1000
# With a size atom
iex> filter = Bloom.new(:"8 KB", expected_number_of_unique_elements)
...
# Or by a maximum byte size:
iex> filter = Bloom.new_by_byte_size(8 * 1024, expected_number_of_unique_elements)
...
# Or by a bit address length:
iex> filter = Bloom.new(16, expected_number_of_unique_elements)
...

Bit Address Length	Size Atom	Bit Address Length	Size Atom	Bit Address Length	Size Atom
		13	1 KB	23	1 MB
		14	2 KB	24	2 MB
		15	4 KB	25	4 MB
6	8 Byte	16	8 KB	26	8 MB
7	16 Byte	17	16 KB	27	16 MB
8	32 Byte	18	32 KB	28	32 MB
9	64 Byte	19	64 KB	29	64 MB
10	128 Byte	20	128 KB	30	128 MB
11	256 Byte	21	256 KB	31	256 MB
12	512 Byte	22	512 KB	32	512 MB

Insert Elements

iex> Bloom.insert(filter, 42)
:ok
iex> Bloom.insert(filter, [1,2,{:atom, <<1,2,3,4>>}])
:ok
iex> Bloom.insert(filter, "Hello!")
:ok

Check for Elements

iex> Bloom.has_not?(filter, "Hello!")
false
iex> Bloom.has?(filter, [1,2,{:atom, <<1,2,3,4>>}])
true
iex> Bloom.has?(filter, :atom)
false
# `has?` is always the opposite of `has_not?`.
iex> Bloom.has?(filter, 42) != Bloom.has_not?(filter, 42)
true

Was actually inserted?	has?	has_not?
yes	yes	no
no	most of the times: no	most of the times: yes

Write To Disk

filename = "prime_filter.bloom"
file = File.stream!(filename, [:delayed_write, :binary], 8096)

(Bloom.stream(filter) |> Stream.into(file) |> Stream.run)

Read From Disk

filename = "prime_filter.bloom"
file = File.stream!(filename, [:read_ahead, :binary], 8096)

{:ok, new_filter} = Bloom.from_stream(file)

Funky Stuff

iex> Bloom.estimate_count(filter)
3
iex> Bloom.false_positive_probability(filter)
3.2348227494719115e-28
# This number is only that small, because we have just 3 elements in 8 KBs.

Example

Prime Numbers

Checking if a number is not a prime and below 100_000.

# helper module
defmodule Step do
    def throught(from, to, step, func) when from < to do
        func.(from)
        throught(from+step, to, step, func)
    end
    def throught(_,_,_,_), do: :ok
end && :ok

# alias so we need to write less
alias Flower.Bloom, as: Bloom


# Select appropriate size.
# We will put about 100_000 elements in.
# 64 KB = 512 KBit
# 512 KBit / 100_000 ~= 5.25 Bits per element.
# Meeh. Let's see...
non_primes = Bloom.new(:"64 KB", 100_000)
# We can also write
non_primes = Bloom.new(19, 100_000)
# or (it will choose the next smaller size)
non_primes = Bloom.new_by_byte_size(64 * 1024, 100_000)

# Let's put some non primes in:
Bloom.insert(non_primes, 42)
Bloom.insert(non_primes, "one hundred")
Bloom.insert(non_primes, [:ok, {Anything, 1.0e9}])

# Let's double check:
Bloom.has?(non_primes, 42)
Bloom.has?(non_primes, "one hundred")
Bloom.has?(non_primes, 7)
Bloom.has?(non_primes, [:ok, {Anything, 1.0e9}])
Bloom.has?(non_primes, 52)
# Works!


# Now we can get a estimate of
# the number of items we
# put in.
Bloom.estimate_count(non_primes)


# Apply Sieve of Eratosthenes.
# This may take a few seconds.
# At least we can do it with
# constant memory.
for x <- 2..50_000 do
  # Skip multiples of previous numbers.
  # This is actually not safe to do
  # with a bloom filter since they are
  # approximations. But let's don't care for now.
  if(Bloom.has_not?(non_primes, x)) do
    Step.throught(x*2, 100_000, x, fn non_prime ->
        # Much of the time is used to calculate
        # hashes.
        Bloom.insert(non_primes, non_prime)
    end)
 end
end && :ok

non_primes |> Bloom.has_not?(12) # not a prime
non_primes |> Bloom.has_not?(11) # prime
non_primes |> Bloom.has_not?(6719) # no prime
non_primes |> Bloom.has_not?(4245) # prime
non_primes |> Bloom.has_not?(9973) # no prime
non_primes |> Bloom.has_not?(3549) # prime
non_primes |> Bloom.has_not?(89591) # prime
non_primes |> Bloom.has_not?(84949) # no prime

# Let's write it to disk:
filename = "prime_filter.bloom"
file = File.stream!(filename, [:delayed_write, :read_ahead, :binary], 8096)
(Bloom.stream(non_primes)
|> Stream.into(file)
|> Stream.run)
# Let's inspect the size of the filter:
File.lstat!(filename).size

# We can also read from disk:
{:ok, new_non_primes} = Bloom.from_stream(file)
new_non_primes |> Bloom.has_not?(12) # not a prime
new_non_primes |> Bloom.has_not?(11) # prime
# Works!

# Maybe a bit high, 64 KB for 100_000 Numbers
# is not that much.
Bloom.false_positive_probability(non_primes)

# Let's reestimate .
# There are 9_592 primes below 100_000, so this
# should yield about 100_000 - 9_592 = 90_408.
Bloom.estimate_count(non_primes)

Side note: If you want to check primes, ~~google~~ search for Miller–Rabin primality test.

flower

v0.1.4