View Source cuckoo_filter (cuckoo_filter v1.0.0)

High-performance, concurrent, and mutable Cuckoo Filter implemented using atomics for Erlang and Elixir.

Summary

Functions

Adds an element to a filter.

Adds an element to a filter by its hash.

Returns the maximum capacity of a filter.
Checks if an element is in a filter.
Checks whether a filter contains a fingerprint at the given index or its alternative index.
Checks if an element is in a filter by its hash.

Deletes an element from a filter.

Deletes an element from a filter by its hash.

Exports a filter as a binary.

Returns the hash value of an element using the hash function of the filter.

Imports filter data from a binary created using export/1.

Creates a new cuckoo filter with the given capacity and options

Returns number of items in a filter.
Retrieves a cuckoo_filter from persistent_term by its name.

Types

-type cuckoo_filter() :: #cuckoo_filter{}.
-type filter_name() :: term().
-type fingerprint() :: pos_integer().
-type hash() :: non_neg_integer().
-type index() :: non_neg_integer().
-type option() ::
    {name, filter_name()} |
    {fingerprint_size, 4 | 8 | 16 | 32 | 64} |
    {bucket_size, pos_integer()} |
    {max_evictions, non_neg_integer()} |
    {hash_function, fun((binary()) -> hash())}.
-type options() :: [option()].

Functions

-spec add(cuckoo_filter() | filter_name(), term()) -> ok | {error, not_enough_space}.

Equivalent to add(Filter, Element, false).

Link to this function

add(Filter, Element, Forced)

View Source
-spec add(cuckoo_filter() | filter_name(), term(), false) -> ok | {error, not_enough_space};
   (cuckoo_filter() | filter_name(), term(), true) ->
       ok | {ok, Evicted :: {index(), fingerprint()}}.

Adds an element to a filter.

Returns ok if the insertion was successful, but could return {error, not_enough_space}, when the filter is nearing its capacity.

Forced argument is a boolean that indicates whether the insertion should be forced or not. A forced insertion will never return a not_enough_space error, instead when the filter is full, another random element is removed, and the removed element is returned as {ok, {Index, Fingerprint}}. In this case, elements are not relocated, and no lock is acquired.

Forced insertion can only be used with max_evictions set to 0.
-spec add_hash(cuckoo_filter() | filter_name(), hash()) -> ok | {error, not_enough_space}.

Equivalent to add_hash(Filter, Element, false).

Link to this function

add_hash(Filter, Hash, Forced)

View Source
-spec add_hash(cuckoo_filter() | filter_name(), hash(), false) -> ok | {error, not_enough_space};
        (cuckoo_filter() | filter_name(), hash(), true) ->
            ok | {ok, Evicted :: {index(), fingerprint()}}.

Adds an element to a filter by its hash.

Same as add/3 except that it accepts the hash of the element instead of the element.
-spec capacity(cuckoo_filter() | filter_name()) -> pos_integer().
Returns the maximum capacity of a filter.
Link to this function

contains(Filter, Element)

View Source
-spec contains(cuckoo_filter() | filter_name(), term()) -> boolean().
Checks if an element is in a filter.
Link to this function

contains_fingerprint(Filter, Index, Fingerprint)

View Source
-spec contains_fingerprint(cuckoo_filter() | filter_name(), index(), fingerprint()) -> boolean().
Checks whether a filter contains a fingerprint at the given index or its alternative index.
Link to this function

contains_hash(Filter, Hash)

View Source
-spec contains_hash(cuckoo_filter() | filter_name(), hash()) -> boolean().
Checks if an element is in a filter by its hash.
-spec delete(cuckoo_filter() | filter_name(), term()) -> ok | {error, not_found}.

Deletes an element from a filter.

Returns ok if the deletion was successful, and returns {error, not_found} if the element could not be found in the filter.

Note: A cuckoo filter can only delete items that are known to be inserted before. Deleting non inserted items might lead to deletion of another element in case of a hash collision.
Link to this function

delete_hash(Filter, Hash)

View Source
-spec delete_hash(cuckoo_filter() | filter_name(), hash()) -> ok | {error, not_found}.

Deletes an element from a filter by its hash.

Same as delete/2 except that it uses the hash of the element instead of the element.
-spec export(cuckoo_filter() | filter_name()) -> binary().

Exports a filter as a binary.

Returned binary can be used to reconstruct the filter again, using import/2 function.
Link to this function

hash(Cuckoo_filter, Element)

View Source
-spec hash(cuckoo_filter() | filter_name(), term()) -> hash().
Returns the hash value of an element using the hash function of the filter.
-spec import(cuckoo_filter() | filter_name(), binary()) -> ok | {error, invalid_data_size}.

Imports filter data from a binary created using export/1.

Returns ok if the import was successful, but could return {ok, invalid_data_size} if the size of the given binary does not match the size of the filter.
-spec new(pos_integer()) -> cuckoo_filter().

Equivalent to new(Capacity, []).

-spec new(pos_integer(), options()) -> cuckoo_filter().

Creates a new cuckoo filter with the given capacity and options

Note that the actual capacity might be higher than the given capacity, because internally number of buckets in a cuckoo filter must be a power of 2.

Possible options are:
  • {name, Name}

    If you give it a name, created filter instance will be stored in persistent_term, and later you can access the filter by its name.

  • {fingerprint_size, FingerprintSize}

    FingerprintSize can be one of 4, 8, 16, 32, and 64 bits. Default fingerprint size is 16 bits.

  • {bucket_size, BucketSize}

    BucketSize must be a non negative integer, and the default value is 4. Higher bucket sizes can reduce insert time considerably since it reduces the number of relocations of existing fingerprints in occupied buckets, but it increases the lookup time, and false positive rate.

  • {max_evictions, MaxEvictions}

    MaxEvictions indicates the maximum number of relocation attemps before giving up when inserting a new element.

  • {hash_function, HashFunction}

    You can specify a custom hash function that accepts a binary as argument and returns hash value as an integer. By default xxh3 hash functions are used, and you need to manually add xxh3 to the list of your project dependencies.

-spec size(cuckoo_filter() | filter_name()) -> non_neg_integer().
Returns number of items in a filter.
-spec whereis(filter_name()) -> cuckoo_filter().
Retrieves a cuckoo_filter from persistent_term by its name.