Text.Phonetic.Soundex (Text v0.5.0)

Copy Markdown View Source

Soundex phonetic encoding (Russell-Odell, 1918).

Encodes a word as a four-character code that groups names sharing rough English pronunciation under the same key. Designed for the English-language US census of 1880; the original use case was finding surnames despite spelling variations on hand-filled forms ("Smith" vs "Smyth", "Robert" vs "Roberts").

The encoding is deliberately lossy:

  • Only the first letter is preserved verbatim.

  • H, W, and the vowels A E I O U Y are dropped after the first position.

  • The remaining consonants are mapped to one of six numeric classes based on phonetic similarity (B F P V1, C G J K Q S X Z2, etc.).

  • Adjacent duplicates of the same class are collapsed to one digit.

  • The result is padded or truncated to four characters: one letter followed by three digits.

When to use

Soundex is primarily useful for English surname matching — the domain it was designed for. It is well-known and widely implemented, which makes it a useful interchange format with legacy systems (Oracle, MySQL, and many genealogy tools all expose it).

For modern fuzzy-name matching, consider Metaphone or Double Metaphone instead — both produce more discriminating codes and handle non-Anglo-Saxon names better. This module ships Soundex primarily for compatibility with those legacy systems and as a baseline reference.

Algorithm reference

Implementation follows the variant codified by the U.S. National Archives at https://www.archives.gov/research/census/soundex.html, which is the de-facto standard.

Summary

Functions

Returns the Soundex code for an English word.

Returns true if name_a and name_b produce the same Soundex code (and both produce a non-empty code).

Functions

encode(word)

@spec encode(String.t()) :: String.t()

Returns the Soundex code for an English word.

Arguments

  • word is a string. Non-letter characters are ignored. The first letter of the result preserves the case-folded first letter of the input.

Returns

  • A four-character string of the form <letter><digit><digit><digit>, e.g. "R163". Returns "" for an empty or letter-free input.

Examples

iex> Text.Phonetic.Soundex.encode("Robert")
"R163"

iex> Text.Phonetic.Soundex.encode("Rupert")
"R163"

iex> Text.Phonetic.Soundex.encode("Rubin")
"R150"

iex> Text.Phonetic.Soundex.encode("Ashcraft")
"A261"

iex> Text.Phonetic.Soundex.encode("Tymczak")
"T522"

iex> Text.Phonetic.Soundex.encode("Pfister")
"P236"

iex> Text.Phonetic.Soundex.encode("Smith")
"S530"

iex> Text.Phonetic.Soundex.encode("Smyth")
"S530"

iex> Text.Phonetic.Soundex.encode("")
""

match?(name_a, name_b)

@spec match?(String.t(), String.t()) :: boolean()

Returns true if name_a and name_b produce the same Soundex code (and both produce a non-empty code).

Arguments

  • name_a is a string.

  • name_b is a string.

Returns

  • true when both inputs produce a non-empty Soundex code and the codes are equal.

  • false otherwise (including when either input is empty or contains no letters).

Examples

iex> Text.Phonetic.Soundex.match?("Robert", "Rupert")
true

iex> Text.Phonetic.Soundex.match?("Smith", "Schmidt")
true

iex> Text.Phonetic.Soundex.match?("Roberts", "Doberts")
false

iex> Text.Phonetic.Soundex.match?("anything", "")
false