Nasty.Language.English.POSTagger (Nasty v0.3.0)
View SourcePart-of-Speech tagger for English using rule-based pattern matching.
Tags tokens with Universal Dependencies POS tags based on:
- Lexical lookup (closed-class words)
- Morphological patterns (suffixes)
- Context-based disambiguation
This is a simple rule-based tagger. For better accuracy, consider using statistical models or neural networks in the future.
Examples
iex> alias Nasty.Language.English.{Tokenizer, POSTagger}
iex> {:ok, tokens} = Tokenizer.tokenize("the")
iex> {:ok, tagged} = POSTagger.tag_pos(tokens)
iex> hd(tagged).pos_tag
:det
Summary
Functions
Tags a list of tokens with POS tags.
Ensemble POS tagging combining rule-based and HMM.
HMM-based POS tagging.
Neural POS tagging using BiLSTM-CRF.
Neural ensemble POS tagging combining neural, HMM, and rule-based models.
Rule-based POS tagging (original implementation).
Transformer-based POS tagging using pre-trained models.
Functions
@spec tag_pos( [Nasty.AST.Token.t()], keyword() ) :: {:ok, [Nasty.AST.Token.t()]}
Tags a list of tokens with POS tags.
Uses:
- Lexical lookup for known words (determiners, pronouns, etc.)
- Morphological patterns (suffixes for verbs, nouns, adjectives)
- Context rules (e.g., word after determiner is likely a noun)
- Statistical models (HMM)
- Neural models (BiLSTM-CRF)
Parameters
tokens- List of Token structs (from tokenizer)opts- Options:model- Model type::rule_based(default),:hmm,:neural,:ensemble,:neural_ensemble,:transformer, or specific transformer model name (e.g.,:roberta_base):hmm_model- Trained HMM model (optional):neural_model- Trained neural model (optional)
Returns
{:ok, tokens}- Tokens with updated pos_tag field
Ensemble POS tagging combining rule-based and HMM.
Uses HMM predictions but falls back to rule-based for punctuation and other deterministic cases.
HMM-based POS tagging.
If no model is provided via :hmm_model option, attempts to load
the latest English POS tagging model from the registry. Falls back
to rule-based tagging if no model is available.
Neural POS tagging using BiLSTM-CRF.
If no model is provided via :neural_model option, attempts to load
the latest neural POS tagging model from the registry. Falls back
to HMM or rule-based tagging if no model is available.
Neural ensemble POS tagging combining neural, HMM, and rule-based models.
Uses neural predictions as primary, with fallback chain: neural -> HMM -> rule-based
Prefers rule-based for high-confidence cases like punctuation and numbers.
Rule-based POS tagging (original implementation).
Transformer-based POS tagging using pre-trained models.
Uses BERT, RoBERTa, or other transformer models for state-of-the-art accuracy (98-99%). Falls back to neural tagging if transformer fails.