Nasty.Language.English.EntityRecognizer (Nasty v0.3.0)

View Source

Named Entity Recognition (NER) for English.

Supports multiple approaches:

  • Rule-based NER (default)
  • CRF-based NER (statistical sequence labeling)

Examples

# Rule-based (default)
iex> tokens = tag_pos("John Smith lives in New York")
iex> entities = EntityRecognizer.recognize(tokens)
[
  %Entity{type: :person, text: "John Smith", ...},
  %Entity{type: :gpe, text: "New York", ...}
]

# CRF-based
iex> entities = EntityRecognizer.recognize(tokens, model: :crf)
[
  %Entity{type: :person, text: "John Smith", ...},
  %Entity{type: :gpe, text: "New York", ...}
]

Summary

Functions

Recognizes named entities in a list of POS-tagged tokens.

CRF-based entity recognition using statistical sequence labeling.

Rule-based entity recognition (original implementation).

Functions

recognize(tokens, opts \\ [])

@spec recognize(
  [Nasty.AST.Token.t()],
  keyword()
) :: [Nasty.AST.Semantic.Entity.t()]

Recognizes named entities in a list of POS-tagged tokens.

Options

  • :model - Model type: :rule_based (default) or :crf
  • :crf_model - Trained CRF model (optional, will load from registry if not provided)

Returns

  • List of Entity structs

recognize_crf(tokens, opts)

@spec recognize_crf(
  [Nasty.AST.Token.t()],
  keyword()
) :: [Nasty.AST.Semantic.Entity.t()]

CRF-based entity recognition using statistical sequence labeling.

If no model is provided via :crf_model option, attempts to load the latest NER CRF model from the registry. Falls back to rule-based recognition if no model is available.

recognize_rule_based(tokens)

@spec recognize_rule_based([Nasty.AST.Token.t()]) :: [Nasty.AST.Semantic.Entity.t()]

Rule-based entity recognition (original implementation).