ExNlp.Filter (ex_nlp v0.1.0)
View SourceUnified API for token filtering operations.
This module provides convenient access to all filter submodules:
ExNlp.Filter.Stopwords- Stopword removalExNlp.Filter.Length- Length-based filteringExNlp.Filter.Case- Case conversionExNlp.Filter.Transform- Pattern replacement and deduplication
Examples
# Filter pipeline
iex> tokens = [%ExNlp.Token{text: "The"}, %ExNlp.Token{text: "quick"}]
...> tokens
...> |> ExNlp.Filter.lowercase()
...> |> ExNlp.Filter.stop_words(:english)
...> |> ExNlp.Filter.min_length(3)
[%ExNlp.Token{text: "quick"}]
Summary
Functions
Converts tokens to lowercase. Delegates to ExNlp.Filter.Case.lowercase/1.
Filters tokens by maximum length. Delegates to ExNlp.Filter.Length.maximum/2.
Filters tokens by minimum length. Delegates to ExNlp.Filter.Length.minimum/2.
Replaces patterns in token text. Delegates to ExNlp.Filter.Transform.pattern_replace/3.
Removes stop words from tokens. Delegates to ExNlp.Filter.Stopwords.filter/2.
Removes duplicate tokens. Delegates to ExNlp.Filter.Transform.unique/1.
Converts tokens to uppercase. Delegates to ExNlp.Filter.Case.uppercase/1.
Types
@type language() :: atom()
Supported language atoms
Pattern for regex replacement
@type token() :: ExNlp.Token.t()
A token struct
Functions
Converts tokens to lowercase. Delegates to ExNlp.Filter.Case.lowercase/1.
@spec max_length([token()], non_neg_integer()) :: [token()]
Filters tokens by maximum length. Delegates to ExNlp.Filter.Length.maximum/2.
@spec min_length([token()], non_neg_integer()) :: [token()]
Filters tokens by minimum length. Delegates to ExNlp.Filter.Length.minimum/2.
Replaces patterns in token text. Delegates to ExNlp.Filter.Transform.pattern_replace/3.
Removes stop words from tokens. Delegates to ExNlp.Filter.Stopwords.filter/2.
Removes duplicate tokens. Delegates to ExNlp.Filter.Transform.unique/1.
Converts tokens to uppercase. Delegates to ExNlp.Filter.Case.uppercase/1.