Nasty.Language.English.TransformerNER (Nasty v0.3.0)

View Source

Transformer-based Named Entity Recognition for English.

Uses pre-trained transformer models fine-tuned for NER to identify and classify named entities (persons, organizations, locations, etc.) using the BIO (Begin-Inside-Outside) tagging scheme.

Expected F1 scores: 93-95% on CoNLL-2003.

Summary

Functions

Gets the label map (ID to BIO tag).

Returns the number of NER labels.

Recognizes named entities in tokens using a transformer model.

Gets the tag to ID map (BIO tag to ID).

Functions

label_map()

@spec label_map() :: %{required(integer()) => String.t()}

Gets the label map (ID to BIO tag).

Examples

TransformerNER.label_map()
# => %{0 => "O", 1 => "B-PER", 2 => "I-PER", ...}

num_labels()

@spec num_labels() :: integer()

Returns the number of NER labels.

Examples

TransformerNER.num_labels()
# => 9

recognize_entities(tokens, opts \\ [])

@spec recognize_entities(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Semantic.Entity.t()]} | {:error, term()}

Recognizes named entities in tokens using a transformer model.

Options

  • :model - Model to use: atom name (e.g., :roberta_base) or :transformer (uses default)
  • :cache_dir - Directory for model caching
  • :device - Device to use (:cpu or :cuda, default: :cpu)
  • :use_cache - Whether to use prediction caching (default: true)

Examples

{:ok, tokens} = Tokenizer.tokenize("John lives in Paris")
{:ok, entities} = TransformerNER.recognize_entities(tokens)

# Use specific model
{:ok, entities} = TransformerNER.recognize_entities(tokens, model: :bert_base_cased)

tag_to_id()

@spec tag_to_id() :: %{required(atom()) => integer()}

Gets the tag to ID map (BIO tag to ID).

Examples

TransformerNER.tag_to_id()
# => %{o: 0, b_per: 1, i_per: 2, ...}