Nasty.Language.Spanish.Morphology (Nasty v0.3.0)

View Source

Morphological analyzer for Spanish tokens.

Provides lemmatization (finding the base form of words) using:

  • Dictionary lookup for irregular forms
  • Rule-based suffix removal for regular conjugations/declensions

Spanish-Specific Features

  • Verb lemmatization: all conjugations → infinitive (-ar, -er, -ir)
  • Noun lemmatization: plural → singular, gender variations
  • Adjective lemmatization: gender/number agreement
  • Morphological features: gender, number, tense, mood, person

Examples

iex> alias Nasty.Language.Spanish.{Tokenizer, POSTagger, Morphology}
iex> {:ok, tokens} = Tokenizer.tokenize("hablando")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> {:ok, analyzed} = Morphology.analyze(tagged)
iex> hd(analyzed).lemma
"hablar"

Summary

Functions

Analyzes tokens to add lemma and morphological features.

Lemmatizes a Spanish word based on its part-of-speech tag.

Functions

analyze(tokens)

@spec analyze([Nasty.AST.Token.t()]) :: {:ok, [Nasty.AST.Token.t()]}

Analyzes tokens to add lemma and morphological features.

Updates each token with:

  • :lemma - Base form of the word (infinitive for verbs, singular for nouns)
  • :morphology - Map of morphological features (gender, number, tense, etc.)

Parameters

  • tokens - List of Token structs (with POS tags)

Returns

  • {:ok, tokens} - Tokens with lemma and morphology fields updated

lemmatize(word, pos_tag)

@spec lemmatize(String.t(), atom()) :: String.t()

Lemmatizes a Spanish word based on its part-of-speech tag.

Returns the base form (lemma) of a word using dictionary lookup for irregular forms and rule-based suffix removal for regular forms.

Parameters

  • word - The word to lemmatize (lowercase string)
  • pos_tag - Part-of-speech tag atom (:verb, :noun, :adj, etc.)

Returns

  • String.t() - The lemmatized form of the word

Examples

iex> Nasty.Language.Spanish.Morphology.lemmatize("hablando", :verb)
"hablar"

iex> Nasty.Language.Spanish.Morphology.lemmatize("casas", :noun)
"casa"

iex> Nasty.Language.Spanish.Morphology.lemmatize("buena", :adj)
"bueno"