Nasty.Lexical.WordNet.Similarity (Nasty v0.3.0)
View SourceSemantic similarity metrics for WordNet synsets.
Provides various algorithms for measuring semantic similarity between words or synsets based on their position in the WordNet hierarchy and their definitions.
Metrics
- Path Similarity - Based on shortest path length in hypernym hierarchy
- Wu-Palmer Similarity - Based on depth of LCS (Least Common Subsumer)
- Lesk Similarity - Based on definition overlap
- Depth - Distance from root in taxonomy
Example
alias Nasty.Lexical.WordNet.Similarity
# Compare "dog" and "cat"
dog_synset = WordNet.synsets("dog", :noun) |> hd()
cat_synset = WordNet.synsets("cat", :noun) |> hd()
# Path similarity
Similarity.path_similarity(dog_synset.id, cat_synset.id) # ~0.2
# Wu-Palmer similarity
Similarity.wup_similarity(dog_synset.id, cat_synset.id) # ~0.857
Summary
Functions
Combines multiple similarity metrics with optional weights.
Calculates the depth of a synset in the taxonomy.
Finds the Least Common Subsumer (LCS) of two synsets.
Calculates Lesk similarity based on definition overlap.
Calculates path-based similarity between two synsets.
Calculates similarity between two words (not synsets).
Calculates Wu-Palmer similarity between two synsets.
Types
Functions
@spec combined_similarity(synset_id(), synset_id(), language(), keyword()) :: similarity_score()
Combines multiple similarity metrics with optional weights.
Returns a weighted average of specified similarity metrics.
Options
:metrics- List of metrics to use (default: all):weights- Weights for each metric (default: equal weights)
Examples
iex> Similarity.combined_similarity(
...> "oewn-02084071-n",
...> "oewn-02121620-n",
...> metrics: [:path, :wup, :lesk],
...> weights: [0.3, 0.5, 0.2]
...> )
0.654
@spec depth(synset_id(), language()) :: non_neg_integer()
Calculates the depth of a synset in the taxonomy.
Depth is measured as the length of the longest path from the synset to a root node (a synset with no hypernyms).
Returns a non-negative integer representing depth.
Examples
iex> Similarity.depth("oewn-00001740-n", :en) # entity (root)
0
iex> Similarity.depth("oewn-02084071-n", :en) # dog
13
Finds the Least Common Subsumer (LCS) of two synsets.
The LCS is the most specific common ancestor (deepest common hypernym) of two synsets in the taxonomy.
Returns the synset ID of the LCS, or nil if no common ancestor exists.
Examples
iex> Similarity.lcs("oewn-02084071-n", "oewn-02121620-n", :en) # dog, cat
"oewn-02075296-n" # carnivore
@spec lesk_similarity(synset_id(), synset_id(), language()) :: similarity_score()
Calculates Lesk similarity based on definition overlap.
Measures similarity by counting overlapping words between synset definitions. This is context-based rather than hierarchy-based.
Returns a score from 0.0 to 1.0, where:
- Higher values = more overlapping words in definitions
- 0.0 = no overlap
Examples
iex> Similarity.lesk_similarity("oewn-02084071-n", "oewn-02121620-n", :en) # dog, cat
0.15 # Some overlap in definitions (animal-related words)
@spec path_similarity(synset_id(), synset_id(), language()) :: similarity_score()
Calculates path-based similarity between two synsets.
Uses the shortest path length in the hypernym/hyponym hierarchy.
Formula: 1 / (path_length + 1)
Returns a score from 0.0 to 1.0, where:
- 1.0 = identical synsets
- Higher values = more similar
- 0.0 = no path exists
Examples
iex> Similarity.path_similarity("oewn-02084071-n", "oewn-02084071-n") # dog == dog
1.0
iex> Similarity.path_similarity("oewn-02084071-n", "oewn-02083346-n") # dog -> canine
0.5
@spec word_similarity(String.t(), String.t(), atom() | nil, language(), keyword()) :: similarity_score()
Calculates similarity between two words (not synsets).
Finds the maximum similarity across all synset pairs for the two words.
Examples
iex> Similarity.word_similarity("dog", "cat", :noun)
0.857
@spec wup_similarity(synset_id(), synset_id(), language()) :: similarity_score()
Calculates Wu-Palmer similarity between two synsets.
Based on the depth of the Least Common Subsumer (LCS) and the depths of the two synsets in the taxonomy.
Formula: 2 * depth(LCS) / (depth(synset1) + depth(synset2))
Returns a score from 0.0 to 1.0, where:
- 1.0 = identical synsets or same depth
- Higher values = more similar
- 0.0 = no common ancestor
This metric often gives more intuitive results than path similarity because it considers depth in the taxonomy.
Examples
iex> Similarity.wup_similarity("oewn-02084071-n", "oewn-02121620-n", :en) # dog, cat
0.857 # High similarity (both are carnivores)
iex> Similarity.wup_similarity("oewn-02084071-n", "oewn-12345678-n", :en) # dog, tree
0.133 # Low similarity (different domains)