Spell correction.
Uses Peter Norvig's classic edit-distance approach: enumerate
every word within edit distance 1 (or 2) of the query, intersect
with a known-words dictionary, and rank by corpus frequency. The
dictionary and frequencies come from Text.WordFreq, so adding
a language for spell correction is the same as adding it to
Text.WordFreq.
This is appropriate for one-shot correction of user input. For high-throughput use cases, a SymSpell-style precomputed deletion index would be faster; that may arrive in a future release.
Edit operations
Distance-1 edits cover the four single-character mistakes that dominate real typing errors:
- Deletion (
hellp→help). - Transposition (
recieve→receive). - Replacement (
speel→spell). - Insertion (
spel→spell).
Distance-2 edits are the same operations applied twice. They produce many more candidates and are slower but catch double-typo cases.
Summary
Functions
Returns the ranked list of correction candidates for a word.
Returns the most likely correction for a word.
Returns the set of all distance-1 edits of a word.
Returns true if the word is in the bundled dictionary.
Functions
Returns the ranked list of correction candidates for a word.
Candidates are returned in descending order of corpus frequency.
Arguments
wordis a single word as a string.
Options
:languageis the language atom passed through toText.WordFreq. Default:en.:max_edit_distanceis1or2. Default2.:limitcaps the number of results. Default is no limit.
Returns
- A list of candidate strings, ordered most-likely first. Empty when the input has no known-word candidates at any considered distance.
Examples
iex> "speling" |> Text.Spell.candidates(limit: 3) |> Enum.take(1)
["spelling"]
iex> Text.Spell.candidates("xqzwt")
[]
Returns the most likely correction for a word.
Arguments
wordis a single word as a string. Lookup is case-insensitive on the dictionary side; the original casing is preserved when no correction is needed.
Options
:languageis the language atom passed through toText.WordFreq. Default:en.:max_edit_distanceis1or2. Default2.
Returns
- The best candidate string. If the input is already a known word, it is returned unchanged. If no candidate at any considered distance is in the dictionary, the original word is returned.
Examples
iex> Text.Spell.correct("speling")
"spelling"
iex> Text.Spell.correct("the")
"the"
Returns the set of all distance-1 edits of a word.
Useful for inspection or building custom correction logic.
Arguments
wordis a string.
Returns
- A list of unique strings, each one edit operation away from the input.
Examples
iex> "spel" |> Text.Spell.edits1() |> Enum.member?("spell")
true
Returns true if the word is in the bundled dictionary.
Arguments
wordis a single word as a string.
Options
:languageis the language atom passed through toText.WordFreq. Default:en.
Returns
trueif the word is known,falseotherwise.
Examples
iex> Text.Spell.known?("hello")
true
iex> Text.Spell.known?("xqzwt")
false