lancaster_stemmer

Types

pub opaque type Rule
pub type Rules =
  dict.Dict(String, List(Rule))

Values

pub fn default_rules() -> dict.Dict(String, List(Rule))

Constructs the default ruleset

pub fn load_rules(
  filename: String,
) -> Result(dict.Dict(String, List(Rule)), Nil)

Constructs a ruleset from the specified file

Format of the file is as follows: Each line contains a specific rule (order matters) The rule consists of a string made up of the following parts

Rule partDescription
suffixthe reverse of the required suffix, e.g. the suffix for winning, ing would be specified gni
* (optional)if the rule is only to be used if a previous rule has not been applied then add an asterisk. For example ht*2. only applies if th is the final suffix, so the stem of breath would be brea but the stem of breathe would be breath because the suffix e has already been removed
number of chars to removethis is the number of characters to remove after the suffix has been matched. For example psychoanalytic has the suffix ytic of which 3 characters should be removed to retain psychoanaly, this would be ‘city3’. This can be 0
append string (optional)this is the characters that are appended after the match and removal of characters
> or .If > then you can continue stemming process after this one, if . then stemming stops

So for example with the psychoanalytic stem of psychoanalys the rule would be ytic3s.

pub fn stem(
  word: String,
  rules: dict.Dict(String, List(Rule)),
) -> String

Lancaster (Paice-Husk) stemming algorithm

Example

lancaster_stemmer.stem("Gleam", lancaster_stemmer.stem.default_rules())
// -> gleam
lancaster_stemmer.stem("fancy", lancaster_stemmer.stem.default_rules())
// -> fant
Search Document