View Source Akin.Metaphone.Double (Akin v0.2.0)

The original Metaphone algorithm was published in 1990 as an improvement over the Soundex algorithm. Like Soundex, it was limited to English-only use. The Metaphone algorithm does not produce phonetic representations of an input word or name; rather, the output is an intentionally approximate phonetic representation. The approximate encoding is necessary to account for the way speakers vary their pronunciations and misspell or otherwise vary words and names they are trying to spell.

The Double Metaphone phonetic encoding algorithm is the second generation of the Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal. It makes a number of fundamental design improvements over the original Metaphone algorithm.

It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.

This script implements the Double Metaphone algorithm (c) 1998, 1999 originally implemented by Lawrence Philips in C++. It was further modified in C++ by Kevin Atkinson (http://aspell.net/metaphone/). It was translated to C by Maurice Aubrey maurice@hevanet.com for use in a Perl extension. A Python version was created by Andrew Collins on January 12, 2007, using the C source (http://www.atomodo.com/code/double-metaphone/metaphone.py/view). This version is based on the python version.

The next key in the struct is used set to a tuple of the next characters in the primary and secondary codes and to indicate how many characters to move forward in the string. The secondary code letter is given only when it is different than the primary. This is an effort to make the code easier to write and read. The default action is to add nothing and move to next char.

Summary

Functions

Skip silent letters at the start of a word or replace the X if the word starts with X as in Xavier with an S

Compare two strings, returning the outcome of the comparison using the strictness of the level.

Initialize the struct

Iterate input characters

Handle conditional cases for different letters. Update phoenemes in the next param of the metaphone struct and return the struct.

All initial vowels map to "A"

Accept two lists. Loop through a cartesian product of the two lists. Using a reducer, iterate over the levels. For each level, compare the item sets using compare/3. The first, if any, level to return a true value from compare/3 stops the reducer and percentage of true values found. Otherwise the reducer continues. 0 is returned if no comparison returns true at any level.

Functions

Link to this function

check_word_start(metaphone)

View Source

Skip silent letters at the start of a word or replace the X if the word starts with X as in Xavier with an S

Link to this function

compare(left, right, level \\ "normal")

View Source

Compare two strings, returning the outcome of the comparison using the strictness of the level.

  • "strict": both encodings for each string must match
  • "strong": the primary encoding for each string must match
  • "normal": the primary encoding of one string must match either encoding of other string (default)
  • "weak": either primary or secondary encoding of one string must match one encoding of other string

Initialize the struct

Link to this function

letter_at_position(metaphone, start_position)

View Source
Link to this function

letter_at_position(metaphone, start_position, close_position)

View Source

Iterate input characters

Link to this function

parse(metaphone, position, end_index, character)

View Source
Link to this function

process(metaphone, character)

View Source

Handle conditional cases for different letters. Update phoenemes in the next param of the metaphone struct and return the struct.

Link to this function

process_initial_vowels(metaphone, position)

View Source

All initial vowels map to "A"

Link to this function

substring_compare(left, right, opts)

View Source

Accept two lists. Loop through a cartesian product of the two lists. Using a reducer, iterate over the levels. For each level, compare the item sets using compare/3. The first, if any, level to return a true value from compare/3 stops the reducer and percentage of true values found. Otherwise the reducer continues. 0 is returned if no comparison returns true at any level.

  • "strict": both encodings for each string must match
  • "strong": the primary encoding for each string must match
  • "normal": the primary encoding of one string must match either encoding of other string (default)
  • "weak": either primary or secondary encoding of one string must match one encoding of other string