Soundex phonetic encoding (Russell-Odell, 1918).
Encodes a word as a four-character code that groups names sharing rough English pronunciation under the same key. Designed for the English-language US census of 1880; the original use case was finding surnames despite spelling variations on hand-filled forms ("Smith" vs "Smyth", "Robert" vs "Roberts").
The encoding is deliberately lossy:
Only the first letter is preserved verbatim.
H,W, and the vowelsA E I O U Yare dropped after the first position.The remaining consonants are mapped to one of six numeric classes based on phonetic similarity (
B F P V→1,C G J K Q S X Z→2, etc.).Adjacent duplicates of the same class are collapsed to one digit.
The result is padded or truncated to four characters: one letter followed by three digits.
When to use
Soundex is primarily useful for English surname matching — the domain it was designed for. It is well-known and widely implemented, which makes it a useful interchange format with legacy systems (Oracle, MySQL, and many genealogy tools all expose it).
For modern fuzzy-name matching, consider Metaphone or Double Metaphone instead — both produce more discriminating codes and handle non-Anglo-Saxon names better. This module ships Soundex primarily for compatibility with those legacy systems and as a baseline reference.
Algorithm reference
Implementation follows the variant codified by the U.S. National Archives at https://www.archives.gov/research/census/soundex.html, which is the de-facto standard.
Summary
Functions
Returns the Soundex code for an English word.
Returns true if name_a and name_b produce the same Soundex
code (and both produce a non-empty code).
Functions
Returns the Soundex code for an English word.
Arguments
wordis a string. Non-letter characters are ignored. The first letter of the result preserves the case-folded first letter of the input.
Returns
- A four-character string of the form
<letter><digit><digit><digit>, e.g."R163". Returns""for an empty or letter-free input.
Examples
iex> Text.Phonetic.Soundex.encode("Robert")
"R163"
iex> Text.Phonetic.Soundex.encode("Rupert")
"R163"
iex> Text.Phonetic.Soundex.encode("Rubin")
"R150"
iex> Text.Phonetic.Soundex.encode("Ashcraft")
"A261"
iex> Text.Phonetic.Soundex.encode("Tymczak")
"T522"
iex> Text.Phonetic.Soundex.encode("Pfister")
"P236"
iex> Text.Phonetic.Soundex.encode("Smith")
"S530"
iex> Text.Phonetic.Soundex.encode("Smyth")
"S530"
iex> Text.Phonetic.Soundex.encode("")
""
Returns true if name_a and name_b produce the same Soundex
code (and both produce a non-empty code).
Arguments
name_ais a string.name_bis a string.
Returns
truewhen both inputs produce a non-empty Soundex code and the codes are equal.falseotherwise (including when either input is empty or contains no letters).
Examples
iex> Text.Phonetic.Soundex.match?("Robert", "Rupert")
true
iex> Text.Phonetic.Soundex.match?("Smith", "Schmidt")
true
iex> Text.Phonetic.Soundex.match?("Roberts", "Doberts")
false
iex> Text.Phonetic.Soundex.match?("anything", "")
false