Metastatic.Analysis.Duplication.Similarity (Metastatic v0.10.3)

View Source

Similarity calculation between ASTs for Type III clone detection.

Implements multiple similarity metrics:

  • Structural similarity (tree-based matching)
  • Token-based similarity (Jaccard coefficient)
  • Combined similarity score

Usage

alias Metastatic.Analysis.Duplication.Similarity

ast1 = {:binary_op, :arithmetic, :+, {:variable, "x"}, {:literal, :integer, 5}}
ast2 = {:binary_op, :arithmetic, :+, {:variable, "x"}, {:literal, :integer, 10}}

Similarity.calculate(ast1, ast2)
# => 0.8 (80% similar)

Examples

# Identical ASTs
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
1.0

# Partially similar ASTs
iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 10}]}
iex> score = Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
iex> score > 0.5
true

Summary

Functions

Calculates overall similarity between two ASTs.

Checks if two ASTs are similar above a threshold.

Calculates structural similarity between two ASTs.

Calculates token-based similarity using Jaccard coefficient.

Functions

calculate(ast1, ast2, opts \\ [])

Calculates overall similarity between two ASTs.

Returns a float between 0.0 (completely different) and 1.0 (identical). Combines structural and token-based similarity.

Options

  • :method - Similarity method (:structural, :token, :combined) (default: :combined)
  • :weights - Weights for combined method {structural_weight, token_weight} (default: {0.6, 0.4})

Examples

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
1.0

iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
0.0

similar?(ast1, ast2, threshold \\ 0.8, opts \\ [])

Checks if two ASTs are similar above a threshold.

Examples

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.similar?(ast1, ast2, 0.8)
true

iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.similar?(ast1, ast2, 0.8)
false

structural_similarity(ast1, ast2)

@spec structural_similarity(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast()) ::
  float()

Calculates structural similarity between two ASTs.

Compares the tree structure by counting matching nodes. Returns ratio of matching nodes to total nodes.

Examples

iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> Metastatic.Analysis.Duplication.Similarity.structural_similarity(ast1, ast2)
1.0

iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:variable, [], "y"}
iex> score = Metastatic.Analysis.Duplication.Similarity.structural_similarity(ast1, ast2)
iex> score > 0.0
true

token_similarity(ast1, ast2)

@spec token_similarity(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast()) ::
  float()

Calculates token-based similarity using Jaccard coefficient.

Compares token sets extracted from ASTs. Returns |intersection| / |union|.

Examples

iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "y"}, {:literal, [subtype: :integer], 10}]}
iex> score = Metastatic.Analysis.Duplication.Similarity.token_similarity(ast1, ast2)
iex> score > 0.5
true