View Source Similarity.Simhash (Similarity v0.4.0)
Simhash string similarity algorithm. Description of Simhash
iex> Similarity.simhash("Barna", "Kovacs")
0.59375
iex> Similarity.simhash("Austria", "Australia")
0.65625
Summary
Functions
Returns the Hamming distance between the left
and right
hash,
given as lists of bits.
Returns the hash for the given string and hash_function
in the given return_type
.
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
Functions
Returns the Hamming distance between the left
and right
hash,
given as lists of bits.
Examples
iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
Returns the hash for the given string and hash_function
in the given return_type
.
Options
:ngram_size
- defaults to 3:hash_function
- defaults to :siphash, available options are :siphash, :md5, :sha256:return_type
- defaults to :list, available options are :list, :int64_unsigned, :int64_signed, :binary
The return types :int64_unsigned
and :int64_signed
are only available for the :siphash
hash function.
Examples
Similarity.Simhash.hash("alma korte")
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_unsigned)
15012197954348909067
iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_signed)
-3434546119360642549
@spec similarity(String.t(), String.t(), pos_integer()) :: float()
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
Options
:ngram_size
- defaults to 3:hash_function
- defaults to :siphash, available options are :siphash, :md5, :sha256
Examples
iex> Similarity.simhash("khan academy", "khan academia")
0.890625
iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0