Copyright © (C) 2019, Matthew Pope
Authors: Matthew Pope.
Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader
They're 'Faster and Smaller Than Bloom and Cuckoo Filters'.
Be wary of memory usage when using this module.
Example Usage:
Filter = exor_filter:xor8(["test1", "test2", "test3"]), true = exor_filter:xor8_contain(Filter, "test1"), false = exor_filter:xor8_contain(Filter, "test6"),Filters are initialized independently:
Filter1 = exor_filter:xor8([1, 2, 3]), Filter2 = exor_filter:xor8([4, 5, 6]), false = exor_filter:xor8_contain(Filter1, 6), true = exor_filter:xor8_contain(Filter1, 2), false = exor_filter:xor8_contain(Filter2, 2), true = exor_filter:xor8_contain(Filter2, 5),Example usage from Elixir:
... Alias :exor_filter, as: XorFilter ... true = [1, 2, 3, 4] |> XorFilter.xor8() |> XorFilter.xor8_contain(1)
The usage of the xor16 is the same. That structure is larger, but has a smaller false positive rate.
The buffered versions of initialize are provided for larger data sets. This can be faster. See xor8_buffered/1 for more information.
Convinience modulesxor8
and xor16
are provided.
hash_function() = default_hash | none | fun((any()) -> non_neg_integer())
xor16/1 | See the xor8/2 documentation. |
xor16/2 | Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements. |
xor16_buffered/1 | Similar to the initialize function, but is a buffered version for lists that are large. |
xor16_buffered/2 | Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. |
xor16_contain/2 | Tests to see if the passed argument is in the filter. |
xor16_contain/3 | Tests to see if the passed argument is in the filter. |
xor16_from_bin/1 | Deserialize the filter from a previous xor16_to_bin call. |
xor16_to_bin/1 | Serialize the filter to a binary. |
xor8/1 | Initializes the xor filter, and runs the default hash function on each of the elements in the list. |
xor8/2 | Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements. |
xor8_buffered/1 | Similar to the initialize function, but is a buffered version for lists that are large. |
xor8_buffered/2 | Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. |
xor8_contain/2 | Tests to see if the passed argument is in the filter. |
xor8_contain/3 | Tests to see if the passed argument is in the filter. |
xor8_from_bin/1 | Deserialize the filter from a previous xor8_to_bin call. |
xor8_to_bin/1 | Serialize the filter to a binary. |
xor16(List::list()) -> {reference(), default_hash} | {error, atom()}
See the xor8/2 documentation.
xor16(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}
Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.
See the xor8/2 documentation.xor16_buffered(List::list()) -> {reference(), atom()} | {error, atom()}
Similar to the initialize function, but is a buffered version for lists that are large. This version uses the default hash.
xor16_buffered(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}
Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. Use for greater speed.
See xor16/1 for example usage.
Returns aRef<>
to a filter to be used in contain
.
xor16_contain(X1::{reference() | binary(), hash_function()}, Key::term()) -> true | false
Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.
DO NOT PASS PRE-HASHED VALUES. The method / fun passed to the initialization function is saved, and is used to compute the hash.
Filters previously serialized with xor16_to_bin
are allowed.
xor16_contain(X1::{reference() | binary(), hash_function()}, Key::term(), ReturnValue::any()) -> true | any()
Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.
Filters previously serialized with xor16_to_bin
are allowed.
true
if the element exists (or there is a false positive).
The third argument will be returned instead of false
if the element is
not in the filter.
xor16_from_bin(X1::{binary(), hash_function()}) -> {reference(), hash_function()}
Deserialize the filter from a previous xor16_to_bin
call.
{reference(), hash_function()}
xor16_to_bin(X1::{reference(), hash_function()}) -> {binary(), hash_function()}
Serialize the filter to a binary
Returnsbinary()
.
xor8(List::list()) -> {reference(), atom()} | {error, atom()}
Initializes the xor filter, and runs the default hash function on each of the elements in the list. This should be fine for the general case.
Returns a {Ref<>
, default_hash
} to be later used, or an error.
xor8(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}
Initializes the xor filter, and runs the specified pre-defined hash function on each of the elements.
There is a predefined hashing function provided, can can be specified by
using default_hash
To pass pre-hashed data, use none
.
OR
Initializes the xor filter, and runs the passed hash function on each of the elements. The hash function must output unsigned 64 bit numbers, or an error will occur. This could be 'safer' than passing raw data, because of the minimal checks on the output of the function are done.
The function must be of arity /1. If you need to pass more data, consider using a list of tuples to transform:
exor_filter:xor8_initialize([{1, 1}, {2, 2}], fun({Num1, Num1} -> Num1 + Num2 end)).
Returns a {Ref<>
, hash_method} to a filter to be used in contain
if a predefined hash function is specified.
Returns a {Ref<>
, hash_function()} if a custom function is passed.
{error, reason}
be returned.
xor8_buffered(List::list()) -> {reference(), atom()} | {error, atom()}
Similar to the initialize function, but is a buffered version for lists that are large. This version uses the default hash.
xor8_buffered(List::list(), HashFunction::hash_function()) -> {reference(), hash_function()} | {error, atom()}
Similar to the initialize function, but is a buffered version for lists that are over 100,000,000 keys. Use for greater speed.
See xor8/1 for example usage.
Returns aRef<>
to a filter to be used in contain
.
xor8_contain(X1::{reference() | binary(), hash_function()}, Key::term()) -> true | false
Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.
DO NOT PASS PRE-HASHED VALUES unless you've specified a pre-hashed filter. The method / fun passed to the initialization function is saved, and is used to compute the hash.
Filters previously serialized with xor8_to_bin
are allowed.
xor8_contain(X1::{reference() | binary(), hash_function()}, Key::term(), ReturnValue::any()) -> true | any()
Tests to see if the passed argument is in the filter. The first argument must be the pre-initialized filter.
Filters previously serialized with xor8_to_bin
are allowed.
true
if the element exists (or there is a false positive).
The third argument will be returned instead of false
if the element is
not in the filter.
xor8_from_bin(X1::{binary(), hash_function()}) -> {reference(), hash_function()}
Deserialize the filter from a previous xor8_to_bin
call.
{reference(), hash_function()}
xor8_to_bin(X1::{reference(), hash_function()}) -> {binary(), hash_function()}
Serialize the filter to a binary
Returnsbinary()
.
Generated by EDoc