scrape v3.1.0 Scrape.Tools.Word

Algorithms to extract information from single words.

Link to this section Summary

Functions

Determine if a given word might be relevant for analytical purposes.

Check if a given word is a stopword against the provided language lists.

Extract the stem of a given word.

Link to this section Functions

Link to this function

is_meaningful?(word, language \\ :en)
is_meaningful?(String.t(), :de | :en) :: boolean()

Determine if a given word might be relevant for analytical purposes.

Uses a simple heuristic and checks for stopword matches.

Examples

iex> Word.is_meaningful?("a", :en)
false

iex> Word.is_meaningful?("apple", :en)
true
Link to this function

is_stopword?(word, language \\ :en)
is_stopword?(String.t(), :de | :en) :: boolean()

Check if a given word is a stopword against the provided language lists.

Note: the provided language lists are all-downcased words.

## Examples

iex> Word.IsStopword.execute("when", :en)
true

iex> Word.IsStopword.execute("linux", :en)
false

iex> Word.IsStopword.execute("ein", :de)
true

iex> Word.IsStopword.execute("elixir", :de)
false
Link to this function

stem(word, language \\ :en)
stem(String.t(), :de | :en) :: String.t()

Extract the stem of a given word.

Uses the snowball algorithm under the hood via the library Stemex, which in turn uses NIFs for raw speed. Currently only german and english are supported.

Example

iex> Word.stem("beautiful", :en)
"beauti"

iex> Word.stem("derbsten", :de)
"derb"