Metastatic.Analysis.Duplication
(Metastatic v0.10.4)
View Source
Code duplication detection at the MetaAST level.
Detects code clones across the same or different programming languages by operating on the unified MetaAST representation. Supports four types of clones:
- Type I: Exact clones (identical AST)
- Type II: Renamed clones (identical structure, different identifiers)
- Type III: Near-miss clones (similar structure with modifications)
- Type IV: Semantic clones (different syntax, same behavior)
Usage
alias Metastatic.{Document, Analysis.Duplication}
# Create two documents
ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
doc1 = Document.new(ast1, :elixir)
doc2 = Document.new(ast2, :elixir)
# Detect duplication
{:ok, result} = Duplication.detect(doc1, doc2)
result.duplicate? # => true
result.clone_type # => :type_i
result.similarity_score # => 1.0Examples
# Type I: Exact clone
iex> ast = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast, :elixir)
iex> doc2 = Metastatic.Document.new(ast, :python)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
true
iex> result.clone_type
:type_i
# No duplication
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :string], "hello"}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
false
Summary
Types
Options for duplication detection.
Functions
Detects duplication between two documents.
Detects duplication between two documents, raising on error.
Detects duplicates across multiple documents.
Detects duplicates across multiple documents, raising on error.
Generates a structural fingerprint for an AST.
Calculates similarity score between two ASTs.
Types
@type detect_opts() :: [ threshold: float(), min_tokens: non_neg_integer(), ignore_literals: boolean(), ignore_variables: boolean(), cross_language: boolean(), clone_types: [atom()] ]
Options for duplication detection.
:threshold- Similarity threshold (0.0-1.0) for Type III detection (default: 0.8):min_tokens- Minimum tokens for detection (default: 5):ignore_literals- Ignore literal values in comparison (default: false):ignore_variables- Ignore variable names in comparison (default: false):cross_language- Enable cross-language detection (default: true):clone_types- List of clone types to detect (default: all types)
Functions
@spec detect(Metastatic.Document.t(), Metastatic.Document.t(), detect_opts()) :: {:ok, Metastatic.Analysis.Duplication.Result.t()}
Detects duplication between two documents.
Compares two MetaAST documents and returns a result indicating whether they are duplicates, the type of clone, and similarity score.
Options
See detect_opts/0 for available options.
Examples
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
true
iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
false
@spec detect!(Metastatic.Document.t(), Metastatic.Document.t(), detect_opts()) :: Metastatic.Analysis.Duplication.Result.t()
Detects duplication between two documents, raising on error.
Examples
iex> ast = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast, :elixir)
iex> doc2 = Metastatic.Document.new(ast, :elixir)
iex> result = Metastatic.Analysis.Duplication.detect!(doc1, doc2)
iex> result.duplicate?
true
@spec detect_in_list([Metastatic.Document.t()], detect_opts()) :: {:ok, [map()]}
Detects duplicates across multiple documents.
Returns a list of clone groups, where each group contains documents that are duplicates of each other.
Options
See detect_opts/0 for available options.
Examples
iex> ast = {:literal, [subtype: :integer], 42}
iex> docs = [
...> Metastatic.Document.new(ast, :elixir),
...> Metastatic.Document.new(ast, :python),
...> Metastatic.Document.new({:literal, [subtype: :string], "hello"}, :elixir)
...> ]
iex> {:ok, groups} = Metastatic.Analysis.Duplication.detect_in_list(docs)
iex> length(groups) > 0
true
@spec detect_in_list!([Metastatic.Document.t()], detect_opts()) :: [map()]
Detects duplicates across multiple documents, raising on error.
Examples
iex> ast = {:literal, [subtype: :integer], 42}
iex> docs = [Metastatic.Document.new(ast, :elixir), Metastatic.Document.new(ast, :python)]
iex> groups = Metastatic.Analysis.Duplication.detect_in_list!(docs)
iex> is_list(groups)
true
@spec fingerprint(Metastatic.AST.meta_ast()) :: String.t()
Generates a structural fingerprint for an AST.
Returns a hash that uniquely identifies the structure. Identical ASTs produce identical fingerprints.
Examples
iex> ast = {:literal, [subtype: :integer], 42}
iex> fp = Metastatic.Analysis.Duplication.fingerprint(ast)
iex> is_binary(fp) and String.length(fp) > 0
true
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.fingerprint(ast1) == Metastatic.Analysis.Duplication.fingerprint(ast2)
true
@spec similarity(Metastatic.AST.meta_ast(), Metastatic.AST.meta_ast()) :: float()
Calculates similarity score between two ASTs.
Returns a float between 0.0 (completely different) and 1.0 (identical).
Examples
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.similarity(ast1, ast2)
1.0
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :string], "hello"}
iex> score = Metastatic.Analysis.Duplication.similarity(ast1, ast2)
iex> score > 0.0 and score < 0.5
true