View Source Similarity.Simhash (Similarity v0.4.0)
Simhash string similarity algorithm. Description of Simhash
iex> Similarity.simhash("Barna", "Kovacs")
0.59375
iex> Similarity.simhash("Austria", "Australia")
0.65625
Summary
Functions
Returns the Hamming distance between the left and right hash,
given as lists of bits.
Returns the hash for the given string and hash_function in the given return_type.
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left and right strings.
Functions
Returns the Hamming distance between the left and right hash,
given as lists of bits.
Examples
iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
Returns the hash for the given string and hash_function in the given return_type.
Options
:ngram_size- defaults to 3:hash_function- defaults to :siphash, available options are :siphash, :md5, :sha256:return_type- defaults to :list, available options are :list, :int64_unsigned, :int64_signed, :binary
The return types :int64_unsigned and :int64_signed are only available for the :siphash hash function.
Examples
Similarity.Simhash.hash("alma korte")
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_unsigned)
15012197954348909067
iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_signed)
-3434546119360642549
@spec similarity(String.t(), String.t(), pos_integer()) :: float()
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left and right strings.
Options
:ngram_size- defaults to 3:hash_function- defaults to :siphash, available options are :siphash, :md5, :sha256
Examples
iex> Similarity.simhash("khan academy", "khan academia")
0.890625
iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0