Nasty.Language.English.TransformerPOSTagger (Nasty v0.3.0)

View Source

Transformer-based Part-of-Speech tagger for English.

Uses pre-trained transformer models (BERT, RoBERTa, etc.) fine-tuned for POS tagging to achieve state-of-the-art accuracy (98-99%).

The tagger supports multiple transformer models and provides seamless integration with the existing Nasty POS tagging API.

Summary

Functions

Gets the label map (ID to UPOS tag).

Returns the number of POS labels.

Tags tokens with POS tags using a transformer model.

Gets the tag to ID map (UPOS tag to ID).

Functions

label_map()

@spec label_map() :: %{required(integer()) => String.t()}

Gets the label map (ID to UPOS tag).

Examples

TransformerPOSTagger.label_map()
# => %{0 => "ADJ", 1 => "ADP", ...}

num_labels()

@spec num_labels() :: integer()

Returns the number of POS labels.

Examples

TransformerPOSTagger.num_labels()
# => 17

tag_pos(tokens, opts \\ [])

@spec tag_pos(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Token.t()]} | {:error, term()}

Tags tokens with POS tags using a transformer model.

Options

  • :model - Model to use: atom name (e.g., :roberta_base) or :transformer (uses default)
  • :cache_dir - Directory for model caching
  • :device - Device to use (:cpu or :cuda, default: :cpu)
  • :use_cache - Whether to use prediction caching (default: true)

Examples

{:ok, tokens} = Tokenizer.tokenize("The cat sat")
{:ok, tagged} = TransformerPOSTagger.tag_pos(tokens)

# Use specific model
{:ok, tagged} = TransformerPOSTagger.tag_pos(tokens, model: :bert_base_cased)

# Disable caching for variable inputs
{:ok, tagged} = TransformerPOSTagger.tag_pos(tokens, use_cache: false)

tag_to_id()

@spec tag_to_id() :: %{required(atom()) => integer()}

Gets the tag to ID map (UPOS tag to ID).

Examples

TransformerPOSTagger.tag_to_id()
# => %{adj: 0, adp: 1, ...}