Nasty.Language.Spanish.POSTagger (Nasty v0.3.0)

Part-of-Speech tagger for Spanish using rule-based pattern matching.

Tags tokens with Universal Dependencies POS tags based on:

Lexical lookup (closed-class words: articles, pronouns, prepositions)
Morphological patterns (verb endings, gender/number markers)
Context-based disambiguation

This is a rule-based tagger that achieves ~80-85% accuracy. For better accuracy, statistical or neural models can be added in the future.

Spanish-Specific Features

Verb conjugations (present, preterite, imperfect, future, conditional, subjunctive)
Gender agreement (masculine/feminine: -o/-a endings)
Number agreement (singular/plural: -s/-es endings)
Clitic pronouns (me, te, se, lo, la, etc.)
Contractions (del = de + el, al = a + el)

Examples

iex> alias Nasty.Language.Spanish.{Tokenizer, POSTagger}
iex> {:ok, tokens} = Tokenizer.tokenize("la casa")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> [art, noun] = tagged
iex> art.pos_tag
:det
iex> noun.pos_tag
:noun

Summary

Functions

tag_pos(tokens, opts \\ [])

Tags a list of tokens with POS tags.

tag_pos_rule_based(tokens)

Rule-based POS tagging for Spanish.

Functions

tag_pos(tokens, opts \\ [])

@spec tag_pos(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Token.t()]}

Tags a list of tokens with POS tags.

Uses:

Lexical lookup for known words (articles, pronouns, prepositions)
Morphological patterns (verb endings, gender/number markers)
Context rules (e.g., word after article is likely a noun)

Parameters

tokens - List of Token structs (from tokenizer)
opts - Options
- :model - Model type: :rule_based (default, only option for now)

Returns

{:ok, tokens} - Tokens with updated pos_tag field

tag_pos_rule_based(tokens)

Rule-based POS tagging for Spanish.