Nasty.Language.English.Morphology (Nasty v0.3.0)
View SourceMorphological analyzer for English tokens.
Provides lemmatization (finding the base form of words) using:
- Dictionary lookup for irregular forms
- Rule-based suffixremoval for regular forms
Examples
iex> alias Nasty.Language.English.{Tokenizer, POSTagger, Morphology}
iex> {:ok, tokens} = Tokenizer.tokenize("running")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> {:ok, analyzed} = Morphology.analyze(tagged)
iex> hd(analyzed).lemma
"run"
Summary
Functions
Analyzes tokens to add lemma and morphological features.
Lemmatizes a word based on its part-of-speech tag.
Functions
@spec analyze([Nasty.AST.Token.t()]) :: {:ok, [Nasty.AST.Token.t()]}
Analyzes tokens to add lemma and morphological features.
Updates each token with:
:lemma- Base form of the word:morphology- Map of morphological features (tense, number, etc.)
Parameters
tokens- List of Token structs (with POS tags)
Returns
{:ok, tokens}- Tokens with lemma and morphology fields updated
Lemmatizes a word based on its part-of-speech tag.
Returns the base form (lemma) of a word using dictionary lookup for irregular forms and rule-based suffix removal for regular forms.
Parameters
word- The word to lemmatize (lowercase string)pos_tag- Part-of-speech tag atom (:verb,:noun,:adj, etc.)
Returns
String.t()- The lemmatized form of the word
Examples
iex> Nasty.Language.English.Morphology.lemmatize("running", :verb)
"run"
iex> Nasty.Language.English.Morphology.lemmatize("cats", :noun)
"cat"
iex> Nasty.Language.English.Morphology.lemmatize("better", :adj)
"good"