Reach (Reach v2.2.0)

Copy Markdown View Source

Program Dependence Graph for Elixir and Erlang.

Reach analyzes Elixir and Erlang source code and builds a graph that captures what depends on what: which expressions produce values consumed by others (data dependence), and which expressions control whether others execute (control dependence).

Building a graph

# Elixir (default)
{:ok, graph} = Reach.string_to_graph("""
def foo(x) do
  y = x + 1
  if y > 10, do: :big, else: :small
end
""")

# Erlang
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)

# Auto-detected from file extension
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")
{:ok, graph} = Reach.file_to_graph("src/my_module.erl")

Querying

Reach.backward_slice(graph, node_id)
Reach.forward_slice(graph, node_id)
Reach.independent?(graph, id_a, id_b)
Reach.nodes(graph, type: :call, module: Enum)
Reach.nodes(graph, type: :call, module: Enum, function: :map, arity: 2)
Reach.data_flows?(graph, source_id, sink_id)

Inspecting nodes

node = Reach.node(graph, some_id)
node.type       #=> :call
node.meta       #=> %{module: Enum, function: :map, arity: 2}
node.source_span #=> %{file: "lib/foo.ex", start_line: 5, ...}

Reach.pure?(node)  #=> true

Summary

Types

A program dependence graph. Built by string_to_graph/2, file_to_graph/2, etc.

Functions

Builds a graph from an already-parsed Elixir AST.

Returns all node IDs that affect the given node (backward slice).

Returns the call graph as a Graph.t().

Returns the children of a block in canonical order.

Returns node IDs on all paths from source to sink.

Returns the effect classification of a node.

Compiles an Elixir source string and builds a graph from the expanded bytecode.

Performs a context-sensitive backward slice through call boundaries.

Returns the control dependencies of a node.

Returns true if controller has a control-dependence edge to target.

Returns the data dependencies of a node.

Returns true if there's a data-dependence path from source to sink.

Returns nodes whose values are never used and have no side effects.

Returns true if there's any dependence path between two nodes.

Returns all dependence edges in the graph.

Reads a source file and builds a program dependence graph.

Same as file_to_graph/2 but raises on error.

Returns all node IDs affected by the given node (forward slice).

Returns the per-function PDG for a {module, function, arity} tuple.

Returns true if the node has data dependents (its value is used elsewhere).

Returns true if two expressions are independent.

Builds a graph from a compiled module (loaded in the VM).

Returns node IDs directly connected to node_id.

Returns the IR node for a given ID, or nil.

Returns all IR nodes, optionally filtered.

Returns true if the path from source to sink passes through any node matching predicate.

Returns true if the node is pure (no side effects).

Parses a source string and builds a program dependence graph.

Same as string_to_graph/2 but raises on parse error.

Finds data flow paths from taint sources to dangerous sinks.

Exports the graph to DOT format for Graphviz visualization.

Returns the underlying Graph.t() (libgraph) for direct traversal.

Types

graph()

@type graph() :: struct()

A program dependence graph. Built by string_to_graph/2, file_to_graph/2, etc.

node_id()

@type node_id() :: Reach.IR.Node.id()

Functions

ast_to_graph(ast, opts \\ [])

@spec ast_to_graph(
  Macro.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Builds a graph from an already-parsed Elixir AST.

Useful when you already have the AST (e.g. from Credo or ExDNA) and don't want to re-parse source.

backward_slice(system_dependence, node_id)

@spec backward_slice(graph(), node_id()) :: [term()]

Returns all node IDs that affect the given node (backward slice).

The backward slice answers: "what does this expression depend on?"

call_graph(system_dependence)

@spec call_graph(graph()) :: Graph.t()

Returns the call graph as a Graph.t().

Vertices are {module, function, arity} tuples.

canonical_order(graph, block_node_id)

@spec canonical_order(graph(), Reach.IR.Node.id()) :: [
  {Reach.IR.Node.id(), Reach.IR.Node.t()}
]

Returns the children of a block in canonical order.

Independent sibling expressions are sorted by structural hash so that reordered-but-equivalent blocks produce the same sequence. Dependent expressions preserve their relative order.

Returns a list of {node_id, ir_node} pairs.

Example

# These two blocks produce the same canonical order:
#   a = 1; b = 2; c = a + b
#   b = 2; a = 1; c = a + b
# Because a=1 and b=2 are independent, they get sorted,
# while c=a+b stays last (depends on both).

chop(graph, source, sink)

@spec chop(graph(), node_id(), node_id()) :: [node_id()]

Returns node IDs on all paths from source to sink.

The chop answers: "how does A influence B?"

classify_effect(node)

@spec classify_effect(Reach.IR.Node.t()) :: Reach.Effects.effect()

Returns the effect classification of a node.

Possible values: :pure, :read, :write, :io, :send, :receive, :exception, :nif, :unknown.

compiled_to_graph(source_or_modules, opts \\ [])

@spec compiled_to_graph(
  String.t() | [{module(), binary()}],
  keyword()
) :: {:ok, graph()} | {:error, term()}

Compiles an Elixir source string and builds a graph from the expanded bytecode.

Unlike string_to_graph/2, this compiles the code first, so macro-expanded constructs (try/rescue inside macros, use callbacks, etc.) are visible.

The source must define complete modules.

context_sensitive_slice(graph, node_id)

@spec context_sensitive_slice(graph(), Reach.IR.Node.id()) :: [Reach.IR.Node.id()]

Performs a context-sensitive backward slice through call boundaries.

Uses the Horwitz-Reps-Binkley two-phase algorithm to avoid impossible paths through call sites.

control_deps(system_dependence, node_id)

@spec control_deps(graph(), Reach.IR.Node.id()) :: [{Reach.IR.Node.id(), term()}]

Returns the control dependencies of a node.

Each entry is {controller_node_id, label}.

controls?(graph, controller_id, target_id)

@spec controls?(graph(), node_id(), node_id()) :: boolean()

Returns true if controller has a control-dependence edge to target.

data_deps(system_dependence, node_id)

@spec data_deps(graph(), Reach.IR.Node.id()) :: [{Reach.IR.Node.id(), atom()}]

Returns the data dependencies of a node.

Each entry is {source_node_id, variable_name}.

data_flows?(graph, source_id, sink_id)

Returns true if there's a data-dependence path from source to sink.

dead_code(graph)

@spec dead_code(graph()) :: [Reach.IR.Node.t()]

Returns nodes whose values are never used and have no side effects.

A node is dead if:

  1. It is pure (no side effects)
  2. No observable output depends on it (return values or effectful calls)

depends?(graph, id_a, id_b)

@spec depends?(graph(), node_id(), node_id()) :: boolean()

Returns true if there's any dependence path between two nodes.

edges(arg1)

@spec edges(graph()) :: [Graph.Edge.t()]

Returns all dependence edges in the graph.

file_to_graph(path, opts \\ [])

@spec file_to_graph(
  Path.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Reads a source file and builds a program dependence graph.

The language is auto-detected from the file extension (.ex / .exs for Elixir, .erl / .hrl for Erlang), or can be set explicitly via the :language option.

Returns {:ok, graph} or {:error, reason}.

file_to_graph!(path, opts \\ [])

@spec file_to_graph!(
  Path.t(),
  keyword()
) :: graph()

Same as file_to_graph/2 but raises on error.

forward_slice(system_dependence, node_id)

@spec forward_slice(graph(), node_id()) :: [term()]

Returns all node IDs affected by the given node (forward slice).

The forward slice answers: "what does this expression influence?"

function_graph(graph, function_id)

@spec function_graph(graph(), {module() | nil, atom(), non_neg_integer()}) ::
  graph() | nil

Returns the per-function PDG for a {module, function, arity} tuple.

has_dependents?(graph, node_id)

@spec has_dependents?(graph(), node_id()) :: boolean()

Returns true if the node has data dependents (its value is used elsewhere).

independent?(sdg, id_x, id_y)

@spec independent?(graph(), node_id(), node_id()) :: boolean()

Returns true if two expressions are independent.

Two expressions are independent when:

  1. No data flows between them in either direction
  2. They execute under the same conditions (same control dependencies)
  3. Their side effects don't conflict

Independent expressions can be safely reordered.

module_to_graph(module, opts \\ [])

@spec module_to_graph(
  module(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Builds a graph from a compiled module (loaded in the VM).

Analyzes the macro-expanded Erlang abstract forms from the BEAM bytecode. This captures code injected by use, defmacro, and other macros that the source-level frontend misses.

Requires the module to be compiled with debug info (the default).

neighbors(graph, node_id, label \\ nil)

@spec neighbors(graph(), Reach.IR.Node.id(), term() | nil) :: [Reach.IR.Node.id()]

Returns node IDs directly connected to node_id.

With no label filter, returns all neighbors (both incoming and outgoing). With a label, returns only neighbors connected by edges with that label.

# All direct neighbors
Reach.neighbors(graph, node_id)

# Only nodes connected by :state_read edges
Reach.neighbors(graph, node_id, :state_read)

# Only data dependencies
Reach.neighbors(graph, node_id, {:data, :x})

node(system_dependence, id)

@spec node(graph(), Reach.IR.Node.id()) :: Reach.IR.Node.t() | nil

Returns the IR node for a given ID, or nil.

nodes(graph, opts \\ [])

@spec nodes(
  graph(),
  keyword()
) :: [Reach.IR.Node.t()]

Returns all IR nodes, optionally filtered.

Options

  • :type — filter by node type (:call, :match, :var, etc.)
  • :module — filter calls by module
  • :function — filter calls by function name
  • :arity — filter by arity

Examples

Reach.nodes(graph, type: :call)
Reach.nodes(graph, type: :call, module: Enum)
Reach.nodes(graph, type: :call, module: Enum, function: :map, arity: 2)

passes_through?(graph, source_id, sink_id, predicate)

@spec passes_through?(graph(), node_id(), node_id(), (Reach.IR.Node.t() -> boolean())) ::
  boolean()

Returns true if the path from source to sink passes through any node matching predicate.

Useful for taint analysis — check if sanitization occurs between a source and sink.

pure?(node)

@spec pure?(Reach.IR.Node.t()) :: boolean()

Returns true if the node is pure (no side effects).

string_to_graph(source, opts \\ [])

@spec string_to_graph(
  String.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Parses a source string and builds a program dependence graph.

Returns {:ok, graph} or {:error, reason}.

Options

  • :language:elixir (default) or :erlang
  • :file — filename for source locations (default: "nofile")
  • :module — module name for call graph resolution

string_to_graph!(source, opts \\ [])

@spec string_to_graph!(
  String.t(),
  keyword()
) :: graph()

Same as string_to_graph/2 but raises on parse error.

taint_analysis(graph, opts)

@spec taint_analysis(
  graph(),
  keyword()
) :: [map()]

Finds data flow paths from taint sources to dangerous sinks.

Returns a list of %{source: node, sink: node, path: [node_id], sanitized: boolean} for each source→sink pair where data flows.

Sources, sinks, and sanitizers can be specified as keyword filters (same format as nodes/2) or as predicate functions.

Options

  • :sources — keyword filter or predicate identifying taint sources
  • :sinks — keyword filter or predicate identifying dangerous sinks
  • :sanitizers — keyword filter or predicate identifying sanitization points (optional)

Examples

Reach.taint_analysis(graph,
  sources: [type: :call, function: :get_param],
  sinks: [type: :call, module: System, function: :cmd, arity: 2],
  sanitizers: [type: :call, function: :sanitize]
)

# Predicates also work
Reach.taint_analysis(graph,
  sources: &(&1.meta[:function] in [:params, :body_params]),
  sinks: [type: :call, module: Ecto.Adapters.SQL]
)

to_dot(arg1)

@spec to_dot(graph()) :: {:ok, String.t()}

Exports the graph to DOT format for Graphviz visualization.

to_graph(arg1)

@spec to_graph(graph()) :: Graph.t()

Returns the underlying Graph.t() (libgraph) for direct traversal.

Use this when you need graph operations that Reach doesn't expose — path finding, subgraphs, BFS/DFS, topological sort, etc.

raw = Reach.to_graph(graph)
Graph.vertices(raw) |> length()
Graph.get_shortest_path(raw, id_a, id_b)