ExNlp.Stopwords (ex_nlp v0.1.0)
View SourceStopword detection and management for multiple languages.
This module provides functionality to check, list, and manage stopwords
in various languages. Stopwords are loaded from files in priv/stopwords/
and cached in-memory using ETS for efficient access.
New stopwords can be added at runtime using add_stop_word/2.
Summary
Functions
Adds a new stop word to the specified language's stop words set.
Clears the stop words cache, forcing reload from files.
Returns a MapSet of stop words for the specified language.
Checks if a word is a stopword in the given language.
Gets the complete list of stopwords for a language.
Removes stopwords from a list of words or tokens.
Returns the list of supported languages.
Types
@type language() :: atom()
Supported language atoms
@type token() :: ExNlp.Token.t()
A token struct
Functions
Adds a new stop word to the specified language's stop words set.
Updates the in-memory cache (does not persist to file).
Examples
iex> ExNlp.Stopwords.add_stop_word(:english, "custom")
:ok
iex> ExNlp.Stopwords.get_stop_words(:english) |> MapSet.member?("custom")
true
@spec clear_cache() :: :ok
Clears the stop words cache, forcing reload from files.
Examples
iex> ExNlp.Stopwords.clear_cache()
:ok
Returns a MapSet of stop words for the specified language.
Loads from cache or file if not already cached.
Examples
iex> stopwords = ExNlp.Stopwords.get_stop_words(:english)
iex> MapSet.member?(stopwords, "the")
true
Checks if a word is a stopword in the given language.
Arguments
word- The word to check (will be lowercased)language- The language atom
Returns
true if the word is a stopword, false otherwise.
Examples
iex> ExNlp.Stopwords.is_stopword?("the", :english)
true
iex> ExNlp.Stopwords.is_stopword?("running", :english)
false
Gets the complete list of stopwords for a language.
Examples
iex> stopwords = ExNlp.Stopwords.list(:english)
iex> Enum.member?(stopwords, "the")
true
Removes stopwords from a list of words or tokens.
Supports both lists of strings and lists of Token structs.
Examples
# With strings
iex> words = ["the", "quick", "brown", "fox", "the"]
iex> ExNlp.Stopwords.remove(words, :english)
["quick", "brown", "fox"]
# With tokens
iex> tokens = [%ExNlp.Token{text: "the"}, %ExNlp.Token{text: "quick"}]
iex> ExNlp.Stopwords.remove(tokens, :english)
[%ExNlp.Token{text: "quick"}]
@spec supported_languages() :: [language()]
Returns the list of supported languages.
Examples
iex> languages = ExNlp.Stopwords.supported_languages()
iex> :english in languages
true