Nasty.Semantic.WordSenseDisambiguation behaviour (Nasty v0.3.0)

Word Sense Disambiguation (WSD) - determining which meaning of a word is used in a given context.

This module provides a simplified, knowledge-based approach suitable for pure Elixir implementation. For state-of-the-art WSD, neural models trained on large corpora would be required.

Approach

Lesk Algorithm: Overlap between word definitions and context
Part-of-Speech filtering: Use POS tags to narrow sense candidates
Context similarity: Compare surrounding words with sense definitions
Frequency-based: Default to most common sense

Example

iex> tokens = [%Token{text: "bank", pos_tag: :noun}, %Token{text: "river", pos_tag: :noun}]
iex> sense = WSD.disambiguate("bank", tokens, language: :en)
{:ok, %Sense{word: "bank", definition: "land alongside a body of water", pos: :noun}}

Summary

Types

sense()

Callbacks

get_related_words(sense)

Callback for getting related words for a sense (synonyms, hypernyms).

get_senses(t, atom)

Callback for providing sense definitions for a word. Returns list of possible senses with definitions.

Functions

calculate_sense_score(impl, sense, context_words)

Calculates overlap score between sense and context.

disambiguate(impl, target_word, context_tokens, opts \\ [])

Disambiguates the sense of a target word given its context.

disambiguate_all(impl, tokens, opts \\ [])

Disambiguates all content words in a list of tokens.

score_senses(impl, senses, context_tokens, window_size)

Scores senses using Lesk algorithm (context-definition overlap).