Metastatic.Analysis.Duplication (Metastatic v0.10.4)

View Source

Code duplication detection at the MetaAST level.

Detects code clones across the same or different programming languages by operating on the unified MetaAST representation. Supports four types of clones:

  • Type I: Exact clones (identical AST)
  • Type II: Renamed clones (identical structure, different identifiers)
  • Type III: Near-miss clones (similar structure with modifications)
  • Type IV: Semantic clones (different syntax, same behavior)

Usage

alias Metastatic.{Document, Analysis.Duplication}

# Create two documents
ast1 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
ast2 = {:binary_op, [category: :arithmetic, operator: :+], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
doc1 = Document.new(ast1, :elixir)
doc2 = Document.new(ast2, :elixir)

# Detect duplication
{:ok, result} = Duplication.detect(doc1, doc2)
result.duplicate?      # => true
result.clone_type      # => :type_i
result.similarity_score  # => 1.0

Examples

# Type I: Exact clone
iex> ast = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast, :elixir)
iex> doc2 = Metastatic.Document.new(ast, :python)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
true
iex> result.clone_type
:type_i

# No duplication
iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :string], "hello"}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
false

Summary

Types

Options for duplication detection.

Functions

Detects duplication between two documents.

Detects duplication between two documents, raising on error.

Detects duplicates across multiple documents.

Detects duplicates across multiple documents, raising on error.

Generates a structural fingerprint for an AST.

Calculates similarity score between two ASTs.

Types

detect_opts()

@type detect_opts() :: [
  threshold: float(),
  min_tokens: non_neg_integer(),
  ignore_literals: boolean(),
  ignore_variables: boolean(),
  cross_language: boolean(),
  clone_types: [atom()]
]

Options for duplication detection.

  • :threshold - Similarity threshold (0.0-1.0) for Type III detection (default: 0.8)
  • :min_tokens - Minimum tokens for detection (default: 5)
  • :ignore_literals - Ignore literal values in comparison (default: false)
  • :ignore_variables - Ignore variable names in comparison (default: false)
  • :cross_language - Enable cross-language detection (default: true)
  • :clone_types - List of clone types to detect (default: all types)

Functions

detect(doc1, doc2, opts \\ [])

Detects duplication between two documents.

Compares two MetaAST documents and returns a result indicating whether they are duplicates, the type of clone, and similarity score.

Options

See detect_opts/0 for available options.

Examples

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
true

iex> ast1 = {:variable, [], "x"}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast1, :elixir)
iex> doc2 = Metastatic.Document.new(ast2, :elixir)
iex> {:ok, result} = Metastatic.Analysis.Duplication.detect(doc1, doc2)
iex> result.duplicate?
false

detect!(doc1, doc2, opts \\ [])

Detects duplication between two documents, raising on error.

Examples

iex> ast = {:literal, [subtype: :integer], 42}
iex> doc1 = Metastatic.Document.new(ast, :elixir)
iex> doc2 = Metastatic.Document.new(ast, :elixir)
iex> result = Metastatic.Analysis.Duplication.detect!(doc1, doc2)
iex> result.duplicate?
true

detect_in_list(documents, opts \\ [])

@spec detect_in_list([Metastatic.Document.t()], detect_opts()) :: {:ok, [map()]}

Detects duplicates across multiple documents.

Returns a list of clone groups, where each group contains documents that are duplicates of each other.

Options

See detect_opts/0 for available options.

Examples

iex> ast = {:literal, [subtype: :integer], 42}
iex> docs = [
...>   Metastatic.Document.new(ast, :elixir),
...>   Metastatic.Document.new(ast, :python),
...>   Metastatic.Document.new({:literal, [subtype: :string], "hello"}, :elixir)
...> ]
iex> {:ok, groups} = Metastatic.Analysis.Duplication.detect_in_list(docs)
iex> length(groups) > 0
true

detect_in_list!(documents, opts \\ [])

@spec detect_in_list!([Metastatic.Document.t()], detect_opts()) :: [map()]

Detects duplicates across multiple documents, raising on error.

Examples

iex> ast = {:literal, [subtype: :integer], 42}
iex> docs = [Metastatic.Document.new(ast, :elixir), Metastatic.Document.new(ast, :python)]
iex> groups = Metastatic.Analysis.Duplication.detect_in_list!(docs)
iex> is_list(groups)
true

fingerprint(ast)

@spec fingerprint(Metastatic.AST.meta_ast()) :: String.t()

Generates a structural fingerprint for an AST.

Returns a hash that uniquely identifies the structure. Identical ASTs produce identical fingerprints.

Examples

iex> ast = {:literal, [subtype: :integer], 42}
iex> fp = Metastatic.Analysis.Duplication.fingerprint(ast)
iex> is_binary(fp) and String.length(fp) > 0
true

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.fingerprint(ast1) == Metastatic.Analysis.Duplication.fingerprint(ast2)
true

similarity(ast1, ast2)

Calculates similarity score between two ASTs.

Returns a float between 0.0 (completely different) and 1.0 (identical).

Examples

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :integer], 42}
iex> Metastatic.Analysis.Duplication.similarity(ast1, ast2)
1.0

iex> ast1 = {:literal, [subtype: :integer], 42}
iex> ast2 = {:literal, [subtype: :string], "hello"}
iex> score = Metastatic.Analysis.Duplication.similarity(ast1, ast2)
iex> score > 0.0 and score < 0.5
true