View Source Akin (Akin v0.2.0)

Akin

Functions for comparing two strings for similarity using a collection of string comparison algorithms for Elixir. Algorithms can be called independently or in total to return a map of metrics.

Options

Options accepted in a keyword list (i.e. [ngram_size: 3]).

  1. algorithms: algorithms to use in comparision. Accepts the name or a keyword list. Default is algorithms/0.
    1. metric - algorithm metric. Default is both
    • "string": uses string algorithms
    • "phonetic": uses phonetic algorithms
    1. unit - algorithm unit. Default is both.
    • "whole": uses algorithms best suited for whole string comparison (distance)
    • "partial": uses algorithms best suited for partial string comparison (substring)
  2. level - level for double phonetic matching. Default is "normal".
    • "strict": both encodings for each string must match
    • "strong": the primary encoding for each string must match
    • "normal": the primary encoding of one string must match either encoding of other string (default)
    • "weak": either primary or secondary encoding of one string must match one encoding of other string
  3. match_at: an algorith score equal to or above this value is condsidered a match. Default is 0.9
  4. ngram_size: number of contiguous letters to split strings into. Default is 2.
  5. short_length: qualifies as "short" to recieve a shortness boost. Used by Name Metric. Default is 8.
  6. stem: boolean representing whether to compare the stemmed version the strings; uses Stemmer. Default false

Summary

Functions

Compare two strings. Return map of algorithm metrics.

Compare a string to a string with logic specific to names. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match and their algorithm metrics.

Compare a string against a list of strings. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match.

Compare a string against a list of strings. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match and their algorithm metrics.

Returns list of unique phonetic encodings produces by the single and double metaphone algorithms.

Functions

Link to this function

compare(left, right, opts \\ default_opts())

View Source
@spec compare(
  binary()
  | %Akin.Corpus{
      list: term(),
      original: term(),
      set: term(),
      stems: term(),
      string: term()
    },
  binary()
  | %Akin.Corpus{
      list: term(),
      original: term(),
      set: term(),
      stems: term(),
      string: term()
    },
  keyword()
) :: map()

Compare two strings. Return map of algorithm metrics.

Options accepted as a keyword list. If no options are given, default values will be used.

Link to this function

match_name_metrics(left, rights, opts \\ default_opts())

View Source
@spec match_name_metrics(binary(), binary(), Keyword.t()) :: %{
  left: binary(),
  match: 0 | 1,
  metrics: [any()],
  right: binary()
}

Compare a string to a string with logic specific to names. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match and their algorithm metrics.

Link to this function

match_names(left, rights, opts \\ default_opts())

View Source
@spec match_names(
  binary()
  | %Akin.Corpus{
      list: term(),
      original: term(),
      set: term(),
      stems: term(),
      string: term()
    },
  binary()
  | %Akin.Corpus{
      list: term(),
      original: term(),
      set: term(),
      stems: term(),
      string: term()
    }
  | list(),
  keyword()
) :: float()

Compare a string against a list of strings. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match.

Future Plans

  • if the name part is an initial, give the initials score its weight, otherwise reduce it
  • if the initials score is significantly higher than the average of the others, reduce the initials score to the average of the others
  • add options
    • "use_average", "top_three", and/or "average_of_top_three"
    • "group" to results into strong matches and weak matches
    • "details" to include the scores in the result list
Link to this function

match_names_metrics(left, rights, opts \\ default_opts())

View Source
@spec match_names_metrics(binary(), list(), keyword()) :: list()

Compare a string against a list of strings. Matches are determined by algorithem metrics equal to or higher than the match_at option. Return a list of strings that are a likely match and their algorithm metrics.

@spec phonemes(
  binary()
  | %Akin.Corpus{
      list: term(),
      original: term(),
      set: term(),
      stems: term(),
      string: term()
    }
) :: list()

Returns list of unique phonetic encodings produces by the single and double metaphone algorithms.