Nasty.Language.Spanish.Morphology (Nasty v0.3.0)
View SourceMorphological analyzer for Spanish tokens.
Provides lemmatization (finding the base form of words) using:
- Dictionary lookup for irregular forms
- Rule-based suffix removal for regular conjugations/declensions
Spanish-Specific Features
- Verb lemmatization: all conjugations → infinitive (-ar, -er, -ir)
- Noun lemmatization: plural → singular, gender variations
- Adjective lemmatization: gender/number agreement
- Morphological features: gender, number, tense, mood, person
Examples
iex> alias Nasty.Language.Spanish.{Tokenizer, POSTagger, Morphology}
iex> {:ok, tokens} = Tokenizer.tokenize("hablando")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> {:ok, analyzed} = Morphology.analyze(tagged)
iex> hd(analyzed).lemma
"hablar"
Summary
Functions
Analyzes tokens to add lemma and morphological features.
Lemmatizes a Spanish word based on its part-of-speech tag.
Functions
@spec analyze([Nasty.AST.Token.t()]) :: {:ok, [Nasty.AST.Token.t()]}
Analyzes tokens to add lemma and morphological features.
Updates each token with:
:lemma- Base form of the word (infinitive for verbs, singular for nouns):morphology- Map of morphological features (gender, number, tense, etc.)
Parameters
tokens- List of Token structs (with POS tags)
Returns
{:ok, tokens}- Tokens with lemma and morphology fields updated
Lemmatizes a Spanish word based on its part-of-speech tag.
Returns the base form (lemma) of a word using dictionary lookup for irregular forms and rule-based suffix removal for regular forms.
Parameters
word- The word to lemmatize (lowercase string)pos_tag- Part-of-speech tag atom (:verb,:noun,:adj, etc.)
Returns
String.t()- The lemmatized form of the word
Examples
iex> Nasty.Language.Spanish.Morphology.lemmatize("hablando", :verb)
"hablar"
iex> Nasty.Language.Spanish.Morphology.lemmatize("casas", :noun)
"casa"
iex> Nasty.Language.Spanish.Morphology.lemmatize("buena", :adj)
"bueno"