Hypex v1.1.1 Hypex View Source
This module provides an Elixir implementation of HyperLogLog as described within http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf. Various implementations are provided in order to account for performance and memory optimizations.
A Hypex instance is simply a three-element Tuple, which provides a slight speed
improvement over using a struct (roughly 10% at last benchmark). This tuple
should only ever be constructed via Hypex.new/2
otherwise you run the risk of
pattern matching errors throughout modification.
Link to this section Summary
Functions
Calculates a cardinality based upon a passed in Hypex
Merges together many Hypex instances with the same seed
Merges together two Hypex instances with the same seed
Create a new Hypex using a width when 16 >= width >= 4
Updates a Hypex instance with a value
Link to this section Types
A Hypex interface structure
Link to this section Functions
Calculates a cardinality based upon a passed in Hypex.
We use the reduce function of the module representing the registers, and track the number of zeroes alongside the initial value needed to create a raw estimate.
Once we have these values we just apply the correction by using the m
value,
the zero count, and the raw estimate.
Examples
iex> hypex = Hypex.new(4)
iex> hypex = Hypex.update(hypex, "one")
iex> hypex = Hypex.update(hypex, "two")
iex> hypex = Hypex.update(hypex, "three")
iex> Hypex.cardinality(hypex) |> round
3
Merges together many Hypex instances with the same seed.
This is done by converting the underlying register structure to a list of bits and taking the max of each index into a new list, before converting back into the register structure.
We accept an arbitrary number of Hypex instances to merge and due to the use of List zipping this comes naturally. We catch empty and single entry Lists to avoid wasting computation.
If you have a scenario in which you have to merge a lot of Hypex structures, you should typically buffer up your merges and then pass them all as a list to this function. This is far more efficient than merging two structures repeatedly.
Examples
iex> h1 = Hypex.new(4)
iex> h1 = Hypex.update(h1, "one")
iex> h1 = Hypex.update(h1, "two")
iex> h2 = Hypex.new(4)
iex> h2 = Hypex.update(h2, "three")
iex> h3 = Hypex.merge([h1, h2])
iex> Hypex.cardinality(h3) |> round
3
Merges together two Hypex instances with the same seed.
Internally this function just wraps the two instances in a list and passes them
throguh to merge/1
.
Create a new Hypex using a width when 16 >= width >= 4
.
The type of register is determined by the module backing the Hypex instance. We normalize to ensure we have a valid module and then initialize the module with the widths.
Once the registers are initialized, we return them inside a Tuple alongside the width and module name.
Examples
iex> Hypex.new(4)
{ Hypex.Array, 4, { :array, 16, 0, 0, 100 } }
iex> Hypex.new(4, Bitstring)
{ Hypex.Bitstring, 4, << 0, 0, 0, 0, 0, 0, 0, 0 >> }
Updates a Hypex instance with a value.
Internally :erlang.phash2/2
is used as a 32-bit hash function due to it being
both readily available and relatively fast. Everything here is done via pattern
matching to achieve fast speeds.
The main performance hit of this function comes when there’s a need to modify a bit inside the register, so we protect against doing this unnecessarily by pre-determining whether the modification will be a no-op.
Examples
iex> 4 |> Hypex.new(Bitstring) |> Hypex.update("one")
{ Hypex.Bitstring, 4, << 0, 0, 0, 0, 0, 0, 0, 2 >> }