Nasty.Utils.Transform (Nasty v0.3.0)

View Source

AST transformation utilities for modifying tree structures.

Provides common transformations like normalization, simplification, and structural modifications.

Examples

# Lowercase all text
iex> Nasty.Utils.Transform.normalize_case(document, :lower)
%Nasty.AST.Document{...}

# Remove stop words
iex> Nasty.Utils.Transform.remove_stop_words(document)
%Nasty.AST.Document{...}

Summary

Functions

Filters tokens in the tree based on a predicate.

Flattens the tree to a sequence of tokens.

Converts all tokens to their lemma forms.

Merges consecutive tokens matching a predicate.

Normalizes text case for all tokens in the tree.

Applies a pipeline of transformations.

Removes punctuation tokens from the tree.

Replaces tokens matching a predicate with a new token.

Validates that a transformation is reversible by round-tripping.

Functions

filter_tokens(node, predicate)

@spec filter_tokens(term(), (Nasty.AST.Token.t() -> boolean())) :: term()

Filters tokens in the tree based on a predicate.

Tokens that don't match the predicate are removed from their parent structures.

Examples

iex> keep_nouns = fn token -> token.pos_tag == :noun end
iex> Nasty.Utils.Transform.filter_tokens(document, keep_nouns)
%Nasty.AST.Document{...}

flatten_to_tokens(node)

@spec flatten_to_tokens(term()) :: [Nasty.AST.Token.t()]

Flattens the tree to a sequence of tokens.

Examples

iex> Nasty.Utils.Transform.flatten_to_tokens(document)
[%Nasty.AST.Token{}, ...]

lemmatize(node)

@spec lemmatize(term()) :: term()

Converts all tokens to their lemma forms.

Examples

iex> Nasty.Utils.Transform.lemmatize(document)
%Nasty.AST.Document{...}

merge_tokens(node, predicate)

@spec merge_tokens(term(), (Nasty.AST.Token.t() -> boolean())) :: term()

Merges consecutive tokens matching a predicate.

Examples

iex> is_propn? = fn token -> token.pos_tag == :propn end
iex> Nasty.Utils.Transform.merge_tokens(document, is_propn?)
%Nasty.AST.Document{...}

normalize_case(node, case_type)

@spec normalize_case(term(), :lower | :upper | :title) :: term()

Normalizes text case for all tokens in the tree.

Options:

  • :lower - Convert to lowercase
  • :upper - Convert to uppercase
  • :title - Convert to title case

Examples

iex> Nasty.Utils.Transform.normalize_case(document, :lower)
%Nasty.AST.Document{...}

pipeline(node, transformations)

@spec pipeline(term(), [(term() -> term())]) :: term()

Applies a pipeline of transformations.

Examples

iex> pipeline = [
...>   &Nasty.Utils.Transform.normalize_case(&1, :lower),
...>   &Nasty.Utils.Transform.remove_punctuation/1,
...>   &Nasty.Utils.Transform.remove_stop_words/1
...> ]
iex> Nasty.Utils.Transform.pipeline(document, pipeline)
%Nasty.AST.Document{...}

remove_punctuation(node)

@spec remove_punctuation(term()) :: term()

Removes punctuation tokens from the tree.

Examples

iex> Nasty.Utils.Transform.remove_punctuation(document)
%Nasty.AST.Document{...}

remove_stop_words(node, stop_words \\ default_stop_words())

@spec remove_stop_words(term(), [String.t()]) :: term()

Removes stop words from the tree.

Examples

iex> stop_words = ["the", "a", "an", "is", "are"]
iex> Nasty.Utils.Transform.remove_stop_words(document, stop_words)
%Nasty.AST.Document{...}

replace_tokens(node, predicate, replacer)

@spec replace_tokens(
  term(),
  (Nasty.AST.Token.t() -> boolean()),
  (Nasty.AST.Token.t() ->
     Nasty.AST.Token.t())
) :: term()

Replaces tokens matching a predicate with a new token.

Examples

iex> replacer = fn token -> %{token | text: "[MASK]"} end
iex> predicate = fn token -> token.pos_tag == :propn end
iex> Nasty.Utils.Transform.replace_tokens(document, predicate, replacer)
%Nasty.AST.Document{...}

round_trip_test(node, transform)

@spec round_trip_test(term(), (term() -> term())) ::
  {:ok, term()} | {:error, String.t()}

Validates that a transformation is reversible by round-tripping.

Examples

iex> Nasty.Utils.Transform.round_trip_test(document, &Nasty.Utils.Transform.normalize_case(&1, :lower))
{:ok, transformed}