Nasty.Utils.Transform (Nasty v0.3.0)
View SourceAST transformation utilities for modifying tree structures.
Provides common transformations like normalization, simplification, and structural modifications.
Examples
# Lowercase all text
iex> Nasty.Utils.Transform.normalize_case(document, :lower)
%Nasty.AST.Document{...}
# Remove stop words
iex> Nasty.Utils.Transform.remove_stop_words(document)
%Nasty.AST.Document{...}
Summary
Functions
Filters tokens in the tree based on a predicate.
Flattens the tree to a sequence of tokens.
Converts all tokens to their lemma forms.
Merges consecutive tokens matching a predicate.
Normalizes text case for all tokens in the tree.
Applies a pipeline of transformations.
Removes punctuation tokens from the tree.
Removes stop words from the tree.
Replaces tokens matching a predicate with a new token.
Validates that a transformation is reversible by round-tripping.
Functions
@spec filter_tokens(term(), (Nasty.AST.Token.t() -> boolean())) :: term()
Filters tokens in the tree based on a predicate.
Tokens that don't match the predicate are removed from their parent structures.
Examples
iex> keep_nouns = fn token -> token.pos_tag == :noun end
iex> Nasty.Utils.Transform.filter_tokens(document, keep_nouns)
%Nasty.AST.Document{...}
@spec flatten_to_tokens(term()) :: [Nasty.AST.Token.t()]
Flattens the tree to a sequence of tokens.
Examples
iex> Nasty.Utils.Transform.flatten_to_tokens(document)
[%Nasty.AST.Token{}, ...]
Converts all tokens to their lemma forms.
Examples
iex> Nasty.Utils.Transform.lemmatize(document)
%Nasty.AST.Document{...}
@spec merge_tokens(term(), (Nasty.AST.Token.t() -> boolean())) :: term()
Merges consecutive tokens matching a predicate.
Examples
iex> is_propn? = fn token -> token.pos_tag == :propn end
iex> Nasty.Utils.Transform.merge_tokens(document, is_propn?)
%Nasty.AST.Document{...}
Normalizes text case for all tokens in the tree.
Options:
:lower- Convert to lowercase:upper- Convert to uppercase:title- Convert to title case
Examples
iex> Nasty.Utils.Transform.normalize_case(document, :lower)
%Nasty.AST.Document{...}
Applies a pipeline of transformations.
Examples
iex> pipeline = [
...> &Nasty.Utils.Transform.normalize_case(&1, :lower),
...> &Nasty.Utils.Transform.remove_punctuation/1,
...> &Nasty.Utils.Transform.remove_stop_words/1
...> ]
iex> Nasty.Utils.Transform.pipeline(document, pipeline)
%Nasty.AST.Document{...}
Removes punctuation tokens from the tree.
Examples
iex> Nasty.Utils.Transform.remove_punctuation(document)
%Nasty.AST.Document{...}
Removes stop words from the tree.
Examples
iex> stop_words = ["the", "a", "an", "is", "are"]
iex> Nasty.Utils.Transform.remove_stop_words(document, stop_words)
%Nasty.AST.Document{...}
@spec replace_tokens( term(), (Nasty.AST.Token.t() -> boolean()), (Nasty.AST.Token.t() -> Nasty.AST.Token.t()) ) :: term()
Replaces tokens matching a predicate with a new token.
Examples
iex> replacer = fn token -> %{token | text: "[MASK]"} end
iex> predicate = fn token -> token.pos_tag == :propn end
iex> Nasty.Utils.Transform.replace_tokens(document, predicate, replacer)
%Nasty.AST.Document{...}
Validates that a transformation is reversible by round-tripping.
Examples
iex> Nasty.Utils.Transform.round_trip_test(document, &Nasty.Utils.Transform.normalize_case(&1, :lower))
{:ok, transformed}