ExNlp.Filter.Transform (ex_nlp v0.1.0)

View Source

Token transformation filters.

Provides pattern replacement and deduplication utilities.

Summary

Types

Pattern for regex replacement

A token struct

Functions

Removes tokens that match a pattern.

Removes duplicate tokens while preserving position information.

Types

pattern()

@type pattern() :: Regex.t() | String.t()

Pattern for regex replacement

token()

@type token() :: ExNlp.Token.t()

A token struct

Functions

pattern_replace(tokens, pattern, replacement \\ "")

@spec pattern_replace([token()], pattern(), String.t()) :: [token()]

Removes tokens that match a pattern.

Examples

iex> tokens = [%ExNlp.Token{text: "word123"}, %ExNlp.Token{text: "test456"}]
iex> ExNlp.Filter.Transform.pattern_replace(tokens, ~r//, "")
[%ExNlp.Token{text: "word"}, %ExNlp.Token{text: "test"}]

unique(tokens)

@spec unique([token() | binary()]) :: [token() | binary()]

Removes duplicate tokens while preserving position information.

Keeps the first occurrence of each unique token text.

Examples

iex> tokens = [%ExNlp.Token{text: "the", position: 0}, %ExNlp.Token{text: "quick", position: 1}, %ExNlp.Token{text: "the", position: 2}]
iex> ExNlp.Filter.Transform.unique(tokens)
[%ExNlp.Token{text: "the", position: 0}, %ExNlp.Token{text: "quick", position: 1}]