Module exor_filter

Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader.

Copyright © (C) 2019, Matthew Pope

Authors: Matthew Pope.

Description

Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader

They're 'Faster and Smaller Than Bloom and Cuckoo Filters'.

Be wary of memory usage when using this module.

Example Usage:

  Filter = exor_filter:xor8(["test1", "test2", "test3"]),
  true   = exor_filter:xor8_contain(Filter, "test1"),
  false  = exor_filter:xor8_contain(Filter, "test6"),
Filters are initialized independently:
  Filter1 = exor_filter:xor8([1, 2, 3]),
  Filter2 = exor_filter:xor8([4, 5, 6]),
 
  false   = exor_filter:xor8_contain(Filter1, 6),
  true    = exor_filter:xor8_contain(Filter1, 2),
  false   = exor_filter:xor8_contain(Filter2, 2),
  true    = exor_filter:xor8_contain(Filter2, 5),
Example usage from Elixir:
  ...
  Alias :exor_filter, as: XorFilter
  ...
  true =
     [1, 2, 3, 4]
     |> XorFilter.xor8()
     |> XorFilter.xor8_contain(1)

The usage of the xor16 is the same. That structure is larger, but has a smaller false positive rate.

The buffered versions of initialize are provided for larger data sets. This can be faster. See xor8_buffered/1 for more information.

Convinience modules xor8 and xor16 are provided.

Data Types

hash_function()

hash_function() = default_hash | none | fun((any()) -> non_neg_integer())

Function Index

xor16/1See the xor8/2 documentation.
xor16/2Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.
xor16_buffered/1Similar to the initialize function, but is a buffered version for lists that are large.
xor16_buffered/2Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys.
xor16_contain/2Tests to see if the passed argument is in the filter.
xor16_contain/3Tests to see if the passed argument is in the filter.
xor16_from_bin/1Deserialize the filter from a previous xor16_to_bin call.
xor16_to_bin/1Serialize the filter to a binary.
xor8/1Initializes the xor filter, and runs the default hash function on each of the elements in the list.
xor8/2Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.
xor8_buffered/1Similar to the initialize function, but is a buffered version for lists that are large.
xor8_buffered/2Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys.
xor8_contain/2Tests to see if the passed argument is in the filter.
xor8_contain/3Tests to see if the passed argument is in the filter.
xor8_from_bin/1Deserialize the filter from a previous xor8_to_bin call.
xor8_to_bin/1Serialize the filter to a binary.

Function Details

xor16/1

xor16(List::list()) -> {reference(), default_hash} | {error, atom()}

See the xor8/2 documentation.

xor16/2

xor16(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}

Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.

See the xor8/2 documentation.

xor16_buffered/1

xor16_buffered(List::list()) -> {reference(), atom()} | {error, atom()}

Similar to the initialize function, but is a buffered version for lists that are large. This version uses the default hash.

xor16_buffered/2

xor16_buffered(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}

Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. Use for greater speed.

See xor16/1 for example usage.

Returns a Ref<> to a filter to be used in contain.

xor16_contain/2

xor16_contain(X1::{reference() | binary(), hash_function()}, Key::term()) -> true | false

Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.

DO NOT PASS PRE-HASHED VALUES. The method / fun passed to the initialization function is saved, and is used to compute the hash.

Filters previously serialized with xor16_to_bin are allowed.

Returns true if the element exists (or if there is a false positive). False if not.

xor16_contain/3

xor16_contain(X1::{reference() | binary(), hash_function()}, Key::term(), ReturnValue::any()) -> true | any()

Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.

Filters previously serialized with xor16_to_bin are allowed.

Returns true if the element exists (or there is a false positive). The third argument will be returned instead of false if the element is not in the filter.

xor16_from_bin/1

xor16_from_bin(X1::{binary(), hash_function()}) -> {reference(), hash_function()}

Deserialize the filter from a previous xor16_to_bin call.

Returns {reference(), hash_function()}

xor16_to_bin/1

xor16_to_bin(X1::{reference(), hash_function()}) -> {binary(), hash_function()}

Serialize the filter to a binary

Returns binary().

xor8/1

xor8(List::list()) -> {reference(), atom()} | {error, atom()}

Initializes the xor filter, and runs the default hash function on each of the elements in the list. This should be fine for the general case.

Returns a {Ref<>, default_hash} to be later used, or an error.

xor8/2

xor8(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}

Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.

There is a predefined hashing function provided, can can be specified by using default_hash To pass pre-hashed data, use none.

OR

Initializes the xor filter, and runs the passed hash function on each of the elements. The hash function must output unsigned 64 bit numbers, or an error will occur. This could be 'safer' than passing raw data, because of the minimal checks on the output of the function are done.

The function must be of arity /1. If you need to pass more data, consider using a list of tuples to transform:

  exor_filter:xor8_initialize([{1, 1}, {2, 2}],
     fun({Num1, Num1} ->
        Num1 + Num2
     end)).

Returns a {Ref<>, hash_method} to a filter to be used in contain if a predefined hash function is specified.

Returns a {Ref<>, hash_function()} if a custom function is passed.

Otherwise, an {error, reason} be returned.

xor8_buffered/1

xor8_buffered(List::list()) -> {reference(), atom()} | {error, atom()}

Similar to the initialize function, but is a buffered version for lists that are large. This version uses the default hash.

xor8_buffered/2

xor8_buffered(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}

Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. Use for greater speed.

See xor8/1 for example usage.

Returns a Ref<> to a filter to be used in contain.

xor8_contain/2

xor8_contain(X1::{reference() | binary(), hash_function()}, Key::term()) -> true | false

Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.

DO NOT PASS PRE-HASHED VALUES unless you've specified a pre-hashed filter. The method / fun passed to the initialization function is saved, and is used to compute the hash.

Filters previously serialized with xor8_to_bin are allowed.

Returns true if the element exists (or if there is a false positive). False if not.

xor8_contain/3

xor8_contain(X1::{reference() | binary(), hash_function()}, Key::term(), ReturnValue::any()) -> true | any()

Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.

Filters previously serialized with xor8_to_bin are allowed.

Returns true if the element exists (or there is a false positive). The third argument will be returned instead of false if the element is not in the filter.

xor8_from_bin/1

xor8_from_bin(X1::{binary(), hash_function()}) -> {reference(), hash_function()}

Deserialize the filter from a previous xor8_to_bin call.

Returns {reference(), hash_function()}

xor8_to_bin/1

xor8_to_bin(X1::{reference(), hash_function()}) -> {binary(), hash_function()}

Serialize the filter to a binary

Returns binary().


Generated by EDoc