Hypex v1.1.1 Hypex View Source

This module provides an Elixir implementation of HyperLogLog as described within http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf. Various implementations are provided in order to account for performance and memory optimizations.

A Hypex instance is simply a three-element Tuple, which provides a slight speed improvement over using a struct (roughly 10% at last benchmark). This tuple should only ever be constructed via Hypex.new/2 otherwise you run the risk of pattern matching errors throughout modification.

Link to this section Summary

Types

t()

A Hypex interface structure

Functions

Calculates a cardinality based upon a passed in Hypex

Merges together many Hypex instances with the same seed

Merges together two Hypex instances with the same seed

Create a new Hypex using a width when 16 >= width >= 4

Updates a Hypex instance with a value

Link to this section Types

A Hypex interface structure

Link to this section Functions

Link to this function cardinality(hypex) View Source
cardinality(hypex :: Hypex.t()) :: cardinality :: number()

Calculates a cardinality based upon a passed in Hypex.

We use the reduce function of the module representing the registers, and track the number of zeroes alongside the initial value needed to create a raw estimate.

Once we have these values we just apply the correction by using the m value, the zero count, and the raw estimate.

Examples

iex> hypex = Hypex.new(4)
iex> hypex = Hypex.update(hypex, "one")
iex> hypex = Hypex.update(hypex, "two")
iex> hypex = Hypex.update(hypex, "three")
iex> Hypex.cardinality(hypex) |> round
3
Link to this function merge(hypices) View Source
merge([hypex :: Hypex.t()]) :: hypex :: Hypex.t()

Merges together many Hypex instances with the same seed.

This is done by converting the underlying register structure to a list of bits and taking the max of each index into a new list, before converting back into the register structure.

We accept an arbitrary number of Hypex instances to merge and due to the use of List zipping this comes naturally. We catch empty and single entry Lists to avoid wasting computation.

If you have a scenario in which you have to merge a lot of Hypex structures, you should typically buffer up your merges and then pass them all as a list to this function. This is far more efficient than merging two structures repeatedly.

Examples

iex> h1 = Hypex.new(4)
iex> h1 = Hypex.update(h1, "one")
iex> h1 = Hypex.update(h1, "two")
iex> h2 = Hypex.new(4)
iex> h2 = Hypex.update(h2, "three")
iex> h3 = Hypex.merge([h1, h2])
iex> Hypex.cardinality(h3) |> round
3
Link to this function merge(h1, h2) View Source
merge(hypex :: Hypex.t(), hypex :: Hypex.t()) :: hypex :: Hypex.t()

Merges together two Hypex instances with the same seed.

Internally this function just wraps the two instances in a list and passes them throguh to merge/1.

Link to this function new(width \\ 16, mod \\ nil) View Source

Create a new Hypex using a width when 16 >= width >= 4.

The type of register is determined by the module backing the Hypex instance. We normalize to ensure we have a valid module and then initialize the module with the widths.

Once the registers are initialized, we return them inside a Tuple alongside the width and module name.

Examples

iex> Hypex.new(4)
{ Hypex.Array, 4, { :array, 16, 0, 0, 100 } }

iex> Hypex.new(4, Bitstring)
{ Hypex.Bitstring, 4, << 0, 0, 0, 0, 0, 0, 0, 0 >> }
Link to this function update(hypex, value) View Source
update(hypex :: Hypex.t(), value :: any()) :: hypex :: Hypex.t()

Updates a Hypex instance with a value.

Internally :erlang.phash2/2 is used as a 32-bit hash function due to it being both readily available and relatively fast. Everything here is done via pattern matching to achieve fast speeds.

The main performance hit of this function comes when there’s a need to modify a bit inside the register, so we protect against doing this unnecessarily by pre-determining whether the modification will be a no-op.

Examples

iex> 4 |> Hypex.new(Bitstring) |> Hypex.update("one")
{ Hypex.Bitstring, 4, << 0, 0, 0, 0, 0, 0, 0, 2 >> }