Metastatic.Analysis.Duplication.Similarity
(Metastatic v0.10.3)
View Source
Similarity calculation between ASTs for Type III clone detection.
Implements multiple similarity metrics:
- Structural similarity (tree-based matching)
- Token-based similarity (Jaccard coefficient)
- Combined similarity score
Usage
alias Metastatic.Analysis.Duplication.Similarity
ast1 = {:binary_op, :arithmetic, :+, {:variable, "x"}, {:literal, :integer, 5}}
ast2 = {:binary_op, :arithmetic, :+, {:variable, "x"}, {:literal, :integer, 10}}
Similarity.calculate(ast1, ast2)
# => 0.8 (80% similar)Examples
# Identical ASTs
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
1.0
# Partially similar ASTs
iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 10}]}
iex> score = Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
iex> score > 0.5
true
Summary
Functions
Calculates overall similarity between two ASTs.
Checks if two ASTs are similar above a threshold.
Calculates structural similarity between two ASTs.
Calculates token-based similarity using Jaccard coefficient.
Functions
@spec calculate(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast(), keyword()) :: float()
Calculates overall similarity between two ASTs.
Returns a float between 0.0 (completely different) and 1.0 (identical). Combines structural and token-based similarity.
Options
:method- Similarity method (:structural,:token,:combined) (default::combined):weights- Weights for combined method{structural_weight, token_weight}(default:{0.6, 0.4})
Examples
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
1.0
iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.calculate(ast1, ast2)
0.0
@spec similar?( Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast(), float(), keyword() ) :: boolean()
Checks if two ASTs are similar above a threshold.
Examples
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.similar?(ast1, ast2, 0.8)
true
iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.Similarity.similar?(ast1, ast2, 0.8)
false
@spec structural_similarity(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast()) :: float()
Calculates structural similarity between two ASTs.
Compares the tree structure by counting matching nodes. Returns ratio of matching nodes to total nodes.
Examples
iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> Metastatic.Analysis.Duplication.Similarity.structural_similarity(ast1, ast2)
1.0
iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:variable, [], "y"}
iex> score = Metastatic.Analysis.Duplication.Similarity.structural_similarity(ast1, ast2)
iex> score > 0.0
true
@spec token_similarity(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast()) :: float()
Calculates token-based similarity using Jaccard coefficient.
Compares token sets extracted from ASTs. Returns |intersection| / |union|.
Examples
iex> ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
iex> ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "y"}, {:literal, [subtype: :integer], 10}]}
iex> score = Metastatic.Analysis.Duplication.Similarity.token_similarity(ast1, ast2)
iex> score > 0.5
true