ExNlp.Stopwords (ex_nlp v0.1.0)

View Source

Stopword detection and management for multiple languages.

This module provides functionality to check, list, and manage stopwords in various languages. Stopwords are loaded from files in priv/stopwords/ and cached in-memory using ETS for efficient access.

New stopwords can be added at runtime using add_stop_word/2.

Summary

Types

Supported language atoms

A token struct

Functions

Adds a new stop word to the specified language's stop words set.

Clears the stop words cache, forcing reload from files.

Returns a MapSet of stop words for the specified language.

Checks if a word is a stopword in the given language.

Gets the complete list of stopwords for a language.

Removes stopwords from a list of words or tokens.

Returns the list of supported languages.

Types

language()

@type language() :: atom()

Supported language atoms

token()

@type token() :: ExNlp.Token.t()

A token struct

Functions

add_stop_word(lang, word)

@spec add_stop_word(language(), String.t()) :: :ok

Adds a new stop word to the specified language's stop words set.

Updates the in-memory cache (does not persist to file).

Examples

iex> ExNlp.Stopwords.add_stop_word(:english, "custom")
:ok
iex> ExNlp.Stopwords.get_stop_words(:english) |> MapSet.member?("custom")
true

clear_cache()

@spec clear_cache() :: :ok

Clears the stop words cache, forcing reload from files.

Examples

iex> ExNlp.Stopwords.clear_cache()
:ok

get_stop_words(lang)

@spec get_stop_words(language()) :: MapSet.t(String.t())

Returns a MapSet of stop words for the specified language.

Loads from cache or file if not already cached.

Examples

iex> stopwords = ExNlp.Stopwords.get_stop_words(:english)
iex> MapSet.member?(stopwords, "the")
true

is_stopword?(word, language)

@spec is_stopword?(String.t(), language()) :: boolean()

Checks if a word is a stopword in the given language.

Arguments

  • word - The word to check (will be lowercased)
  • language - The language atom

Returns

true if the word is a stopword, false otherwise.

Examples

iex> ExNlp.Stopwords.is_stopword?("the", :english)
true

iex> ExNlp.Stopwords.is_stopword?("running", :english)
false

list(language)

@spec list(language()) :: [String.t()]

Gets the complete list of stopwords for a language.

Examples

iex> stopwords = ExNlp.Stopwords.list(:english)
iex> Enum.member?(stopwords, "the")
true

remove(items, language)

@spec remove([String.t() | token()], language()) :: [String.t() | token()]

Removes stopwords from a list of words or tokens.

Supports both lists of strings and lists of Token structs.

Examples

# With strings
iex> words = ["the", "quick", "brown", "fox", "the"]
iex> ExNlp.Stopwords.remove(words, :english)
["quick", "brown", "fox"]

# With tokens
iex> tokens = [%ExNlp.Token{text: "the"}, %ExNlp.Token{text: "quick"}]
iex> ExNlp.Stopwords.remove(tokens, :english)
[%ExNlp.Token{text: "quick"}]

supported_languages()

@spec supported_languages() :: [language()]

Returns the list of supported languages.

Examples

iex> languages = ExNlp.Stopwords.supported_languages()
iex> :english in languages
true