Nasty.Language.Spanish.POSTagger (Nasty v0.3.0)
View SourcePart-of-Speech tagger for Spanish using rule-based pattern matching.
Tags tokens with Universal Dependencies POS tags based on:
- Lexical lookup (closed-class words: articles, pronouns, prepositions)
- Morphological patterns (verb endings, gender/number markers)
- Context-based disambiguation
This is a rule-based tagger that achieves ~80-85% accuracy. For better accuracy, statistical or neural models can be added in the future.
Spanish-Specific Features
- Verb conjugations (present, preterite, imperfect, future, conditional, subjunctive)
- Gender agreement (masculine/feminine: -o/-a endings)
- Number agreement (singular/plural: -s/-es endings)
- Clitic pronouns (me, te, se, lo, la, etc.)
- Contractions (del = de + el, al = a + el)
Examples
iex> alias Nasty.Language.Spanish.{Tokenizer, POSTagger}
iex> {:ok, tokens} = Tokenizer.tokenize("la casa")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> [art, noun] = tagged
iex> art.pos_tag
:det
iex> noun.pos_tag
:noun
Summary
Functions
@spec tag_pos( [Nasty.AST.Token.t()], keyword() ) :: {:ok, [Nasty.AST.Token.t()]}
Tags a list of tokens with POS tags.
Uses:
- Lexical lookup for known words (articles, pronouns, prepositions)
- Morphological patterns (verb endings, gender/number markers)
- Context rules (e.g., word after article is likely a noun)
Parameters
tokens- List of Token structs (from tokenizer)opts- Options:model- Model type::rule_based(default, only option for now)
Returns
{:ok, tokens}- Tokens with updated pos_tag field
Rule-based POS tagging for Spanish.