Double Metaphone phonetic encoding (Lawrence Philips, 2000).
Double Metaphone is the de-facto standard for fuzzy matching of English names with non-English origins. Unlike single Metaphone, it returns two codes per input — a primary and an alternate — reflecting the fact that the same Anglicised name may be pronounced differently depending on the speaker's expectations.
Two names are considered a match when any of the four (primary_a, alternate_a) × (primary_b, alternate_b) combinations agree.
When to use
Double Metaphone is the strongest of the four phonetic encodings
shipped with text for English-language name matching, and it
handles non-Anglo-Saxon names (Slavic, Italian, Spanish, French,
German, Greek, …) noticeably better than Soundex,
Metaphone, or NYSIIS. It is the algorithm Apache Lucene's
DoubleMetaphoneFilter uses, and the algorithm Python's
jellyfish and metaphone packages expose by default.
Use Soundex only for compatibility with legacy systems that
expose Soundex codes (databases, government records).
Use Cologne for German-only corpora — it outperforms Double
Metaphone there.
Reference
Philips, L. (2000). The Double Metaphone Search Algorithm. C/C++ Users Journal, 18(6), 38–43.
This implementation is a port of the canonical algorithm and validates against the test vectors published with the original paper.
Summary
Functions
Returns the Double Metaphone code pair {primary, alternate} for
name.
Returns true if name_a and name_b share at least one of the
four primary/alternate code combinations (and both produce non-empty
codes).
Types
Functions
Returns the Double Metaphone code pair {primary, alternate} for
name.
When the name is unambiguous, primary and alternate are the
same string.
Arguments
nameis a string. Diacritics are folded viaText.Clean.unaccent/1before encoding; non-Latin letters are discarded.
Options
:max_length— truncate both codes to this many characters. Defaults to4(the canonical Philips length); passnilto skip truncation.
Returns
- A 2-tuple
{primary, alternate}of uppercase ASCII strings. Returns{"", ""}for empty input or input containing no Latin letters.
Examples
iex> Text.Phonetic.DoubleMetaphone.encode("Smith")
{"SM0", "XMT"}
iex> Text.Phonetic.DoubleMetaphone.encode("Schmidt")
{"XMT", "SMT"}
iex> Text.Phonetic.DoubleMetaphone.encode("Thompson")
{"TMPS", "TMPS"}
Returns true if name_a and name_b share at least one of the
four primary/alternate code combinations (and both produce non-empty
codes).
Arguments
name_ais a string.name_bis a string.
Options
Same as encode/2. Both inputs are encoded with the same options.
Returns
truewhen both inputs produce a non-empty code pair and any one of the four combinations (primary_a/primary_b,primary_a/alternate_b,alternate_a/primary_b,alternate_a/alternate_b) match.falseotherwise.
Examples
iex> Text.Phonetic.DoubleMetaphone.match?("Smith", "Schmidt")
true
iex> Text.Phonetic.DoubleMetaphone.match?("Smith", "Brown")
false