Nasty.Language.Spanish.POSTagger (Nasty v0.3.0)

View Source

Part-of-Speech tagger for Spanish using rule-based pattern matching.

Tags tokens with Universal Dependencies POS tags based on:

  • Lexical lookup (closed-class words: articles, pronouns, prepositions)
  • Morphological patterns (verb endings, gender/number markers)
  • Context-based disambiguation

This is a rule-based tagger that achieves ~80-85% accuracy. For better accuracy, statistical or neural models can be added in the future.

Spanish-Specific Features

  • Verb conjugations (present, preterite, imperfect, future, conditional, subjunctive)
  • Gender agreement (masculine/feminine: -o/-a endings)
  • Number agreement (singular/plural: -s/-es endings)
  • Clitic pronouns (me, te, se, lo, la, etc.)
  • Contractions (del = de + el, al = a + el)

Examples

iex> alias Nasty.Language.Spanish.{Tokenizer, POSTagger}
iex> {:ok, tokens} = Tokenizer.tokenize("la casa")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> [art, noun] = tagged
iex> art.pos_tag
:det
iex> noun.pos_tag
:noun

Summary

Functions

Tags a list of tokens with POS tags.

Rule-based POS tagging for Spanish.

Functions

tag_pos(tokens, opts \\ [])

@spec tag_pos(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Token.t()]}

Tags a list of tokens with POS tags.

Uses:

  1. Lexical lookup for known words (articles, pronouns, prepositions)
  2. Morphological patterns (verb endings, gender/number markers)
  3. Context rules (e.g., word after article is likely a noun)

Parameters

  • tokens - List of Token structs (from tokenizer)
  • opts - Options
    • :model - Model type: :rule_based (default, only option for now)

Returns

  • {:ok, tokens} - Tokens with updated pos_tag field

tag_pos_rule_based(tokens)

Rule-based POS tagging for Spanish.