Nasty.Language.English.Morphology (Nasty v0.3.0)

View Source

Morphological analyzer for English tokens.

Provides lemmatization (finding the base form of words) using:

  • Dictionary lookup for irregular forms
  • Rule-based suffixremoval for regular forms

Examples

iex> alias Nasty.Language.English.{Tokenizer, POSTagger, Morphology}
iex> {:ok, tokens} = Tokenizer.tokenize("running")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> {:ok, analyzed} = Morphology.analyze(tagged)
iex> hd(analyzed).lemma
"run"

Summary

Functions

Analyzes tokens to add lemma and morphological features.

Lemmatizes a word based on its part-of-speech tag.

Functions

analyze(tokens)

@spec analyze([Nasty.AST.Token.t()]) :: {:ok, [Nasty.AST.Token.t()]}

Analyzes tokens to add lemma and morphological features.

Updates each token with:

  • :lemma - Base form of the word
  • :morphology - Map of morphological features (tense, number, etc.)

Parameters

  • tokens - List of Token structs (with POS tags)

Returns

  • {:ok, tokens} - Tokens with lemma and morphology fields updated

lemmatize(word, pos_tag)

@spec lemmatize(String.t(), atom()) :: String.t()

Lemmatizes a word based on its part-of-speech tag.

Returns the base form (lemma) of a word using dictionary lookup for irregular forms and rule-based suffix removal for regular forms.

Parameters

  • word - The word to lemmatize (lowercase string)
  • pos_tag - Part-of-speech tag atom (:verb, :noun, :adj, etc.)

Returns

  • String.t() - The lemmatized form of the word

Examples

iex> Nasty.Language.English.Morphology.lemmatize("running", :verb)
"run"

iex> Nasty.Language.English.Morphology.lemmatize("cats", :noun)
"cat"

iex> Nasty.Language.English.Morphology.lemmatize("better", :adj)
"good"