Text.Spell (Text v0.5.0)

Copy Markdown View Source

Spell correction.

Uses Peter Norvig's classic edit-distance approach: enumerate every word within edit distance 1 (or 2) of the query, intersect with a known-words dictionary, and rank by corpus frequency. The dictionary and frequencies come from Text.WordFreq, so adding a language for spell correction is the same as adding it to Text.WordFreq.

This is appropriate for one-shot correction of user input. For high-throughput use cases, a SymSpell-style precomputed deletion index would be faster; that may arrive in a future release.

Edit operations

Distance-1 edits cover the four single-character mistakes that dominate real typing errors:

  • Deletion (hellphelp).
  • Transposition (recievereceive).
  • Replacement (speelspell).
  • Insertion (spelspell).

Distance-2 edits are the same operations applied twice. They produce many more candidates and are slower but catch double-typo cases.

Summary

Functions

Returns the ranked list of correction candidates for a word.

Returns the most likely correction for a word.

Returns the set of all distance-1 edits of a word.

Returns true if the word is in the bundled dictionary.

Functions

candidates(word, options \\ [])

@spec candidates(
  String.t(),
  keyword()
) :: [String.t()]

Returns the ranked list of correction candidates for a word.

Candidates are returned in descending order of corpus frequency.

Arguments

  • word is a single word as a string.

Options

  • :language is the language atom passed through to Text.WordFreq. Default :en.

  • :max_edit_distance is 1 or 2. Default 2.

  • :limit caps the number of results. Default is no limit.

Returns

  • A list of candidate strings, ordered most-likely first. Empty when the input has no known-word candidates at any considered distance.

Examples

iex> "speling" |> Text.Spell.candidates(limit: 3) |> Enum.take(1)
["spelling"]

iex> Text.Spell.candidates("xqzwt")
[]

correct(word, options \\ [])

@spec correct(
  String.t(),
  keyword()
) :: String.t()

Returns the most likely correction for a word.

Arguments

  • word is a single word as a string. Lookup is case-insensitive on the dictionary side; the original casing is preserved when no correction is needed.

Options

  • :language is the language atom passed through to Text.WordFreq. Default :en.

  • :max_edit_distance is 1 or 2. Default 2.

Returns

  • The best candidate string. If the input is already a known word, it is returned unchanged. If no candidate at any considered distance is in the dictionary, the original word is returned.

Examples

iex> Text.Spell.correct("speling")
"spelling"

iex> Text.Spell.correct("the")
"the"

edits1(word)

@spec edits1(String.t()) :: [String.t()]

Returns the set of all distance-1 edits of a word.

Useful for inspection or building custom correction logic.

Arguments

  • word is a string.

Returns

  • A list of unique strings, each one edit operation away from the input.

Examples

iex> "spel" |> Text.Spell.edits1() |> Enum.member?("spell")
true

known?(word, options \\ [])

@spec known?(
  String.t(),
  keyword()
) :: boolean()

Returns true if the word is in the bundled dictionary.

Arguments

  • word is a single word as a string.

Options

  • :language is the language atom passed through to Text.WordFreq. Default :en.

Returns

  • true if the word is known, false otherwise.

Examples

iex> Text.Spell.known?("hello")
true

iex> Text.Spell.known?("xqzwt")
false