Text.Phonetic.NYSIIS (Text v0.5.0)

Copy Markdown View Source

New York State Identification and Intelligence System (NYSIIS) phonetic encoding (Robert L. Taft, 1970).

NYSIIS was designed as a Soundex successor for English personal-name matching. Compared to Soundex it:

  • keeps letters rather than digits, so the codes are pronounceable;
  • is more discriminating in practice (Roberts and Doberts get different codes);
  • handles common English-name patterns natively (MACMCC, KNNN, PH/PFFF, SCHSSS, etc.).

This module implements Taft's original algorithm, optionally with the 6-character truncation that the 1970 specification mandated. Pass max_length: nil (the default) to skip truncation; pass 6 for the classical fixed-length code.

When to use

NYSIIS is a strong default for fuzzy English name matching when you want the matching key to remain readable. For maximum discrimination on multi-cultural names, prefer Text.Phonetic.DoubleMetaphone.

References

Taft, R. L. (1970). Name Search Techniques. New York State Identification and Intelligence System.

https://www.archives.gov/research/census/soundex/ describes the Soundex / NYSIIS lineage.

Summary

Functions

Returns the NYSIIS code for name.

Returns true if name_a and name_b produce the same NYSIIS code.

Functions

encode(name, options \\ [])

@spec encode(
  String.t(),
  keyword()
) :: String.t()

Returns the NYSIIS code for name.

Arguments

  • name is a string. Non-Latin letters and diacritics are folded to ASCII via Text.Clean.unaccent/1 before encoding.

Options

  • :max_length — truncate the resulting code to this length. Pass 6 for the classical Taft NYSIIS. Defaults to nil (no truncation).

Returns

  • The NYSIIS code as an uppercase ASCII string. Returns "" for empty input or input with no Latin letters.

Examples

iex> Text.Phonetic.NYSIIS.encode("Watkins")
"WATCAN"

iex> Text.Phonetic.NYSIIS.encode("MacDonald")
"MCDANALD"

iex> Text.Phonetic.NYSIIS.encode("MacDonald", max_length: 6)
"MCDANA"

match?(name_a, name_b, options \\ [])

@spec match?(String.t(), String.t(), keyword()) :: boolean()

Returns true if name_a and name_b produce the same NYSIIS code.

Arguments

  • name_a is a string.

  • name_b is a string.

Options

Same as encode/2. The same options are applied to both inputs.

Returns

  • true when both inputs produce a non-empty NYSIIS code and the codes are equal.

  • false otherwise (including when either input is empty or contains no Latin letters).

Examples

iex> Text.Phonetic.NYSIIS.match?("MacDonald", "McDonald")
true

iex> Text.Phonetic.NYSIIS.match?("Smith", "Schmidt")
false