ExNlp.Filter (ex_nlp v0.1.0)

View Source

Unified API for token filtering operations.

This module provides convenient access to all filter submodules:

Examples

# Filter pipeline
iex> tokens = [%ExNlp.Token{text: "The"}, %ExNlp.Token{text: "quick"}]
...> tokens
...> |> ExNlp.Filter.lowercase()
...> |> ExNlp.Filter.stop_words(:english)
...> |> ExNlp.Filter.min_length(3)
[%ExNlp.Token{text: "quick"}]

Summary

Types

Supported language atoms

Pattern for regex replacement

A token struct

Functions

Converts tokens to lowercase. Delegates to ExNlp.Filter.Case.lowercase/1.

Filters tokens by maximum length. Delegates to ExNlp.Filter.Length.maximum/2.

Filters tokens by minimum length. Delegates to ExNlp.Filter.Length.minimum/2.

Removes stop words from tokens. Delegates to ExNlp.Filter.Stopwords.filter/2.

Removes duplicate tokens. Delegates to ExNlp.Filter.Transform.unique/1.

Converts tokens to uppercase. Delegates to ExNlp.Filter.Case.uppercase/1.

Types

language()

@type language() :: atom()

Supported language atoms

pattern()

@type pattern() :: Regex.t() | String.t()

Pattern for regex replacement

token()

@type token() :: ExNlp.Token.t()

A token struct

Functions

lowercase(tokens)

@spec lowercase([token()]) :: [token()]

Converts tokens to lowercase. Delegates to ExNlp.Filter.Case.lowercase/1.

max_length(tokens, max_len \\ 50)

@spec max_length([token()], non_neg_integer()) :: [token()]

Filters tokens by maximum length. Delegates to ExNlp.Filter.Length.maximum/2.

min_length(tokens, min_len \\ 2)

@spec min_length([token()], non_neg_integer()) :: [token()]

Filters tokens by minimum length. Delegates to ExNlp.Filter.Length.minimum/2.

pattern_replace(tokens, pattern, replacement \\ "")

@spec pattern_replace([token()], pattern(), String.t()) :: [token()]

Replaces patterns in token text. Delegates to ExNlp.Filter.Transform.pattern_replace/3.

stop_words(tokens, lang \\ :english)

@spec stop_words([token()], language()) :: [token()]

Removes stop words from tokens. Delegates to ExNlp.Filter.Stopwords.filter/2.

unique(tokens)

@spec unique([token()]) :: [token()]

Removes duplicate tokens. Delegates to ExNlp.Filter.Transform.unique/1.

uppercase(tokens)

@spec uppercase([token()]) :: [token()]

Converts tokens to uppercase. Delegates to ExNlp.Filter.Case.uppercase/1.