Nasty.Language.Catalan.POSTagger (Nasty v0.3.0)
View SourcePart-of-Speech tagger for Catalan using rule-based pattern matching.
Tags tokens with Universal Dependencies POS tags based on:
- Lexical lookup (closed-class words: articles, pronouns, prepositions)
- Morphological patterns (verb endings, gender/number markers)
- Context-based disambiguation
Catalan-Specific Features
- Verb conjugations (present, preterite, imperfect, future, conditional, subjunctive)
- Gender agreement (masculine/feminine: -o/-a, -e endings)
- Number agreement (singular/plural: -s/-es endings)
- Clitic pronouns (em, et, es, el, la, etc.)
- Contractions (del = de + el, al = a + el, pel = per + el)
- Interpunct words (col·laborar, intel·ligent)
Summary
Functions
@spec tag_pos( [Nasty.AST.Token.t()], keyword() ) :: {:ok, [Nasty.AST.Token.t()]}
Tags a list of tokens with POS tags.
Uses:
- Lexical lookup for known words (articles, pronouns, prepositions)
- Morphological patterns (verb endings, gender/number markers)
- Context rules (e.g., word after article is likely a noun)
Parameters
tokens- List of Token structs (from tokenizer)opts- Options:model- Model type::rule_based(default, only option for now)
Returns
{:ok, tokens}- Tokens with updated pos_tag field
Rule-based POS tagging for Catalan.