Nasty.Utils.Query (Nasty v0.3.0)

View Source

High-level query API for extracting information from AST.

Provides convenient functions for common AST queries without requiring explicit traversal logic.

Examples

# Find all noun phrases
iex> Nasty.Utils.Query.find_all(document, :noun_phrase)
[%Nasty.AST.NounPhrase{}, ...]

# Extract entities
iex> Nasty.Utils.Query.extract_entities(document, type: :PERSON)
[%Nasty.AST.Entity{text: "John Smith", type: :PERSON}, ...]

# Find subject of sentence
iex> Nasty.Utils.Query.find_subject(sentence)
%Nasty.AST.NounPhrase{head: %Nasty.AST.Token{text: "cat"}}

Summary

Functions

Checks if all nodes of a type match a predicate.

Checks if any node in the tree matches a predicate.

Gets all content words (nouns, verbs, adjectives, adverbs).

Counts nodes of a specific type in the tree.

Extracts all named entities from the document.

Extracts text spans for all nodes matching a predicate.

Filters nodes by a custom predicate function.

Finds all nodes of a specific type.

Finds all tokens with a specific lemma.

Finds all tokens with a specific POS tag.

Finds all tokens matching a text pattern.

Finds the main verb of a sentence or clause.

Finds all objects (complements) of a verb phrase.

Finds the subject of a sentence or clause.

Gets all function words (determiners, prepositions, conjunctions, etc.).

Gets all sentences from a document.

Gets all tokens from any node.

Functions

all?(node, type, predicate)

@spec all?(term(), atom(), (term() -> boolean())) :: boolean()

Checks if all nodes of a type match a predicate.

Examples

iex> all_lowercase? = fn %Token{text: text} -> text == String.downcase(text) end
iex> tokens = Nasty.Utils.Query.find_all(document, :token)
iex> Enum.all?(tokens, all_lowercase?)
false

any?(node, predicate)

@spec any?(term(), (term() -> boolean())) :: boolean()

Checks if any node in the tree matches a predicate.

Examples

iex> has_verb? = &match?(%Nasty.AST.Token{pos_tag: :verb}, &1)
iex> Nasty.Utils.Query.any?(document, has_verb?)
true

content_words(node)

@spec content_words(term()) :: [Nasty.AST.Token.t()]

Gets all content words (nouns, verbs, adjectives, adverbs).

Examples

iex> Nasty.Utils.Query.content_words(document)
[%Nasty.AST.Token{text: "cat", pos_tag: :noun}, ...]

count(node, type)

@spec count(term(), atom()) :: non_neg_integer()

Counts nodes of a specific type in the tree.

Examples

iex> Nasty.Utils.Query.count(document, :token)
42

iex> Nasty.Utils.Query.count(document, :sentence)
7

extract_entities(node, opts \\ [])

@spec extract_entities(
  term(),
  keyword()
) :: [Nasty.AST.Semantic.Entity.t()]

Extracts all named entities from the document.

Options

  • :type - Filter by entity type (e.g., :PERSON, :ORG, :LOC)

Examples

iex> Nasty.Utils.Query.extract_entities(document)
[%Nasty.AST.Entity{text: "John", type: :PERSON}, ...]

iex> Nasty.Utils.Query.extract_entities(document, type: :PERSON)
[%Nasty.AST.Entity{text: "John", type: :PERSON}, ...]

extract_spans(node, source_text, predicate)

@spec extract_spans(term(), String.t(), (term() -> boolean())) :: [
  {String.t(), map()}
]

Extracts text spans for all nodes matching a predicate.

Returns a list of {text, span} tuples.

Examples

iex> is_noun? = &match?(%Nasty.AST.Token{pos_tag: :noun}, &1)
iex> Nasty.Utils.Query.extract_spans(document, source_text, is_noun?)
[{"cat", %{start_pos: {1, 4}, end_pos: {1, 7}, ...}}, ...]

filter(node, predicate)

@spec filter(term(), (term() -> boolean())) :: [term()]

Filters nodes by a custom predicate function.

Examples

iex> is_question? = &match?(%Nasty.AST.Sentence{function: :interrogative}, &1)
iex> Nasty.Utils.Query.filter(document, is_question?)
[%Nasty.AST.Sentence{function: :interrogative}, ...]

find_all(node, type)

@spec find_all(term(), atom()) :: [term()]

Finds all nodes of a specific type.

Examples

iex> Nasty.Utils.Query.find_all(document, :noun_phrase)
[%Nasty.AST.NounPhrase{}, ...]

iex> Nasty.Utils.Query.find_all(document, :token)
[%Nasty.AST.Token{}, ...]

find_by_lemma(node, lemma)

@spec find_by_lemma(term(), String.t()) :: [Nasty.AST.Token.t()]

Finds all tokens with a specific lemma.

Examples

iex> Nasty.Utils.Query.find_by_lemma(document, "run")
[%Nasty.AST.Token{text: "runs", lemma: "run"}, ...]

find_by_pos(node, pos_tag)

@spec find_by_pos(term(), atom()) :: [Nasty.AST.Token.t()]

Finds all tokens with a specific POS tag.

Examples

iex> Nasty.Utils.Query.find_by_pos(document, :noun)
[%Nasty.AST.Token{text: "cat", pos_tag: :noun}, ...]

iex> Nasty.Utils.Query.find_by_pos(document, :verb)
[%Nasty.AST.Token{text: "runs", pos_tag: :verb}, ...]

find_by_text(node, pattern)

@spec find_by_text(term(), String.t() | Regex.t()) :: [Nasty.AST.Token.t()]

Finds all tokens matching a text pattern.

Examples

iex> Nasty.Utils.Query.find_by_text(document, "cat")
[%Nasty.AST.Token{text: "cat"}, ...]

iex> Nasty.Utils.Query.find_by_text(document, ~r/^run/)
[%Nasty.AST.Token{text: "run"}, %Nasty.AST.Token{text: "runs"}, ...]

find_main_verb(arg1)

Finds the main verb of a sentence or clause.

Returns the head verb token if present, otherwise nil.

Examples

iex> sentence = %Nasty.AST.Sentence{...}
iex> Nasty.Utils.Query.find_main_verb(sentence)
%Nasty.AST.Token{text: "runs", pos_tag: :verb}

find_objects(arg1)

@spec find_objects(
  Nasty.AST.VerbPhrase.t()
  | Nasty.AST.Clause.t()
  | Nasty.AST.Sentence.t()
) :: [term()]

Finds all objects (complements) of a verb phrase.

Examples

iex> vp = %Nasty.AST.VerbPhrase{complements: [obj1, obj2]}
iex> Nasty.Utils.Query.find_objects(vp)
[obj1, obj2]

find_subject(arg1)

@spec find_subject(Nasty.AST.Sentence.t() | Nasty.AST.Clause.t()) ::
  Nasty.AST.NounPhrase.t() | nil

Finds the subject of a sentence or clause.

Returns the subject noun phrase if present, otherwise nil.

Examples

iex> sentence = %Nasty.AST.Sentence{...}
iex> Nasty.Utils.Query.find_subject(sentence)
%Nasty.AST.NounPhrase{head: %Nasty.AST.Token{text: "cat"}}

function_words(node)

@spec function_words(term()) :: [Nasty.AST.Token.t()]

Gets all function words (determiners, prepositions, conjunctions, etc.).

Examples

iex> Nasty.Utils.Query.function_words(document)
[%Nasty.AST.Token{text: "the", pos_tag: :det}, ...]

sentences(doc)

@spec sentences(Nasty.AST.Document.t()) :: [Nasty.AST.Sentence.t()]

Gets all sentences from a document.

Examples

iex> Nasty.Utils.Query.sentences(document)
[%Nasty.AST.Sentence{}, ...]

tokens(node)

@spec tokens(term()) :: [Nasty.AST.Token.t()]

Gets all tokens from any node.

Examples

iex> Nasty.Utils.Query.tokens(document)
[%Nasty.AST.Token{}, ...]