ExNlp.Filter (ex_nlp v0.1.0)

Unified API for token filtering operations.

This module provides convenient access to all filter submodules:

ExNlp.Filter.Stopwords - Stopword removal
ExNlp.Filter.Length - Length-based filtering
ExNlp.Filter.Case - Case conversion
ExNlp.Filter.Transform - Pattern replacement and deduplication

Examples

# Filter pipeline
iex> tokens = [%ExNlp.Token{text: "The"}, %ExNlp.Token{text: "quick"}]
...> tokens
...> |> ExNlp.Filter.lowercase()
...> |> ExNlp.Filter.stop_words(:english)
...> |> ExNlp.Filter.min_length(3)
[%ExNlp.Token{text: "quick"}]

Summary

Types

language()

Supported language atoms

pattern()

Pattern for regex replacement

token()

A token struct

Functions

lowercase(tokens)

Converts tokens to lowercase. Delegates to ExNlp.Filter.Case.lowercase/1.

max_length(tokens, max_len \\ 50)

Filters tokens by maximum length. Delegates to ExNlp.Filter.Length.maximum/2.

min_length(tokens, min_len \\ 2)

Filters tokens by minimum length. Delegates to ExNlp.Filter.Length.minimum/2.

pattern_replace(tokens, pattern, replacement \\ "")

Replaces patterns in token text. Delegates to ExNlp.Filter.Transform.pattern_replace/3.

stop_words(tokens, lang \\ :english)

Removes stop words from tokens. Delegates to ExNlp.Filter.Stopwords.filter/2.

unique(tokens)

Removes duplicate tokens. Delegates to ExNlp.Filter.Transform.unique/1.

uppercase(tokens)

Converts tokens to uppercase. Delegates to ExNlp.Filter.Case.uppercase/1.