Nasty.Language.Catalan.Morphology (Nasty v0.3.0)
View SourceMorphological analyzer for Catalan tokens.
Provides lemmatization (finding the base form of words) using:
- Dictionary lookup for irregular forms
- Rule-based suffix removal for regular conjugations/declensions
Catalan-Specific Features
- Verb lemmatization: all conjugations → infinitive (-ar, -re, -ir)
- Noun lemmatization: plural → singular, gender variations
- Adjective lemmatization: gender/number agreement
- Morphological features: gender, number, tense, mood, person
- Clitic handling (em, et, es, el, la, etc.)
Summary
Functions
Analyzes tokens to add lemma and morphological features.
Lemmatizes a Catalan word based on its part-of-speech tag.
Functions
@spec analyze([Nasty.AST.Token.t()]) :: {:ok, [Nasty.AST.Token.t()]}
Analyzes tokens to add lemma and morphological features.
Updates each token with:
:lemma- Base form of the word (infinitive for verbs, singular for nouns):morphology- Map of morphological features (gender, number, tense, etc.)
Parameters
tokens- List of Token structs (with POS tags)
Returns
{:ok, tokens}- Tokens with lemma and morphology fields updated
Lemmatizes a Catalan word based on its part-of-speech tag.
Returns the base form (lemma) of a word using dictionary lookup for irregular forms and rule-based suffix removal for regular forms.
Parameters
word- The word to lemmatize (lowercase string)pos_tag- Part-of-speech tag atom (:verb,:noun,:adj, etc.)
Returns
String.t()- The lemmatized form of the word