Nasty.Semantic.WordSenseDisambiguation behaviour (Nasty v0.3.0)
View SourceWord Sense Disambiguation (WSD) - determining which meaning of a word is used in a given context.
This module provides a simplified, knowledge-based approach suitable for pure Elixir implementation. For state-of-the-art WSD, neural models trained on large corpora would be required.
Approach
- Lesk Algorithm: Overlap between word definitions and context
- Part-of-Speech filtering: Use POS tags to narrow sense candidates
- Context similarity: Compare surrounding words with sense definitions
- Frequency-based: Default to most common sense
Example
iex> tokens = [%Token{text: "bank", pos_tag: :noun}, %Token{text: "river", pos_tag: :noun}]
iex> sense = WSD.disambiguate("bank", tokens, language: :en)
{:ok, %Sense{word: "bank", definition: "land alongside a body of water", pos: :noun}}
Summary
Callbacks
Callback for getting related words for a sense (synonyms, hypernyms).
Callback for providing sense definitions for a word. Returns list of possible senses with definitions.
Functions
Calculates overlap score between sense and context.
Disambiguates the sense of a target word given its context.
Disambiguates all content words in a list of tokens.
Scores senses using Lesk algorithm (context-definition overlap).
Types
Callbacks
Functions
Calculates overlap score between sense and context.
@spec disambiguate(module(), String.t(), [Nasty.AST.Token.t()], keyword()) :: {:ok, sense()} | {:error, term()}
Disambiguates the sense of a target word given its context.
Parameters
impl- Implementation module providing sense definitionstarget_word- The word to disambiguatecontext_tokens- List of tokens in the surrounding contextopts- Options:pos_tag- POS tag of target word (helps filter senses):window_size- Context window size (default: 10)
Returns {:ok, sense} or {:error, reason}.
@spec disambiguate_all(module(), [Nasty.AST.Token.t()], keyword()) :: [ {Nasty.AST.Token.t(), sense()} ]
Disambiguates all content words in a list of tokens.
Returns list of {token, sense} tuples.
Scores senses using Lesk algorithm (context-definition overlap).