Nasty.Semantic.WordSenseDisambiguation behaviour (Nasty v0.3.0)

View Source

Word Sense Disambiguation (WSD) - determining which meaning of a word is used in a given context.

This module provides a simplified, knowledge-based approach suitable for pure Elixir implementation. For state-of-the-art WSD, neural models trained on large corpora would be required.

Approach

  1. Lesk Algorithm: Overlap between word definitions and context
  2. Part-of-Speech filtering: Use POS tags to narrow sense candidates
  3. Context similarity: Compare surrounding words with sense definitions
  4. Frequency-based: Default to most common sense

Example

iex> tokens = [%Token{text: "bank", pos_tag: :noun}, %Token{text: "river", pos_tag: :noun}]
iex> sense = WSD.disambiguate("bank", tokens, language: :en)
{:ok, %Sense{word: "bank", definition: "land alongside a body of water", pos: :noun}}

Summary

Callbacks

Callback for getting related words for a sense (synonyms, hypernyms).

Callback for providing sense definitions for a word. Returns list of possible senses with definitions.

Functions

Calculates overlap score between sense and context.

Disambiguates the sense of a target word given its context.

Disambiguates all content words in a list of tokens.

Scores senses using Lesk algorithm (context-definition overlap).

Types

sense()

@type sense() :: %{
  word: String.t(),
  definition: String.t(),
  pos: atom(),
  examples: [String.t()],
  frequency_rank: integer()
}

Callbacks

get_senses(t, atom)

@callback get_senses(String.t(), atom()) :: [sense()]

Callback for providing sense definitions for a word. Returns list of possible senses with definitions.

Functions

calculate_sense_score(impl, sense, context_words)

@spec calculate_sense_score(module(), sense(), MapSet.t()) :: float()

Calculates overlap score between sense and context.

disambiguate(impl, target_word, context_tokens, opts \\ [])

@spec disambiguate(module(), String.t(), [Nasty.AST.Token.t()], keyword()) ::
  {:ok, sense()} | {:error, term()}

Disambiguates the sense of a target word given its context.

Parameters

  • impl - Implementation module providing sense definitions
  • target_word - The word to disambiguate
  • context_tokens - List of tokens in the surrounding context
  • opts - Options
    • :pos_tag - POS tag of target word (helps filter senses)
    • :window_size - Context window size (default: 10)

Returns {:ok, sense} or {:error, reason}.

disambiguate_all(impl, tokens, opts \\ [])

@spec disambiguate_all(module(), [Nasty.AST.Token.t()], keyword()) :: [
  {Nasty.AST.Token.t(), sense()}
]

Disambiguates all content words in a list of tokens.

Returns list of {token, sense} tuples.

score_senses(impl, senses, context_tokens, window_size)

@spec score_senses(module(), [sense()], [Nasty.AST.Token.t()], integer()) :: [
  {sense(), float()}
]

Scores senses using Lesk algorithm (context-definition overlap).