lancaster_stemmer
Types
Values
pub fn load_rules(
filename: String,
) -> Result(dict.Dict(String, List(Rule)), Nil)
Constructs a ruleset from the specified file
Format of the file is as follows: Each line contains a specific rule (order matters) The rule consists of a string made up of the following parts
| Rule part | Description |
|---|---|
| suffix | the reverse of the required suffix, e.g. the suffix for winning, ing would be specified gni |
| * (optional) | if the rule is only to be used if a previous rule has not been applied then add an asterisk. For example ht*2. only applies if th is the final suffix, so the stem of breath would be brea but the stem of breathe would be breath because the suffix e has already been removed |
| number of chars to remove | this is the number of characters to remove after the suffix has been matched. For example psychoanalytic has the suffix ytic of which 3 characters should be removed to retain psychoanaly, this would be ‘city3’. This can be 0 |
| append string (optional) | this is the characters that are appended after the match and removal of characters |
| > or . | If > then you can continue stemming process after this one, if . then stemming stops |
So for example with the psychoanalytic stem of psychoanalys the rule would be ytic3s.