Reach (Reach v2.2.0)

Program Dependence Graph for Elixir and Erlang.

Reach analyzes Elixir and Erlang source code and builds a graph that captures what depends on what: which expressions produce values consumed by others (data dependence), and which expressions control whether others execute (control dependence).

Building a graph

# Elixir (default)
{:ok, graph} = Reach.string_to_graph("""
def foo(x) do
  y = x + 1
  if y > 10, do: :big, else: :small
end
""")

# Erlang
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)

# Auto-detected from file extension
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")
{:ok, graph} = Reach.file_to_graph("src/my_module.erl")

Querying

Reach.backward_slice(graph, node_id)
Reach.forward_slice(graph, node_id)
Reach.independent?(graph, id_a, id_b)
Reach.nodes(graph, type: :call, module: Enum)
Reach.nodes(graph, type: :call, module: Enum, function: :map, arity: 2)
Reach.data_flows?(graph, source_id, sink_id)

Inspecting nodes

node = Reach.node(graph, some_id)
node.type       #=> :call
node.meta       #=> %{module: Enum, function: :map, arity: 2}
node.source_span #=> %{file: "lib/foo.ex", start_line: 5, ...}

Reach.pure?(node)  #=> true

Summary

Types

graph()

A program dependence graph. Built by string_to_graph/2, file_to_graph/2, etc.

node_id()

Functions

ast_to_graph(ast, opts \\ [])

Builds a graph from an already-parsed Elixir AST.

backward_slice(system_dependence, node_id)

Returns all node IDs that affect the given node (backward slice).

call_graph(system_dependence)

Returns the call graph as a Graph.t().

canonical_order(graph, block_node_id)

Returns the children of a block in canonical order.

chop(graph, source, sink)

Returns node IDs on all paths from source to sink.

classify_effect(node)

Returns the effect classification of a node.

compiled_to_graph(source_or_modules, opts \\ [])

Compiles an Elixir source string and builds a graph from the expanded bytecode.

context_sensitive_slice(graph, node_id)

Performs a context-sensitive backward slice through call boundaries.

control_deps(system_dependence, node_id)

Returns the control dependencies of a node.

controls?(graph, controller_id, target_id)

Returns true if controller has a control-dependence edge to target.

data_deps(system_dependence, node_id)

Returns the data dependencies of a node.

data_flows?(graph, source_id, sink_id)

Returns true if there's a data-dependence path from source to sink.

dead_code(graph)

Returns nodes whose values are never used and have no side effects.

depends?(graph, id_a, id_b)

Returns true if there's any dependence path between two nodes.

edges(arg1)

Returns all dependence edges in the graph.

file_to_graph(path, opts \\ [])

Reads a source file and builds a program dependence graph.

file_to_graph!(path, opts \\ [])

Same as file_to_graph/2 but raises on error.

forward_slice(system_dependence, node_id)

Returns all node IDs affected by the given node (forward slice).

function_graph(graph, function_id)

Returns the per-function PDG for a {module, function, arity} tuple.

has_dependents?(graph, node_id)

Returns true if the node has data dependents (its value is used elsewhere).

independent?(sdg, id_x, id_y)

Returns true if two expressions are independent.

module_to_graph(module, opts \\ [])

Builds a graph from a compiled module (loaded in the VM).

neighbors(graph, node_id, label \\ nil)

Returns node IDs directly connected to node_id.

node(system_dependence, id)

Returns the IR node for a given ID, or nil.

nodes(graph, opts \\ [])

Returns all IR nodes, optionally filtered.

passes_through?(graph, source_id, sink_id, predicate)

Returns true if the path from source to sink passes through any node matching predicate.

pure?(node)

Returns true if the node is pure (no side effects).

string_to_graph(source, opts \\ [])

Parses a source string and builds a program dependence graph.

string_to_graph!(source, opts \\ [])

Same as string_to_graph/2 but raises on parse error.

taint_analysis(graph, opts)

Finds data flow paths from taint sources to dangerous sinks.

to_dot(arg1)

Exports the graph to DOT format for Graphviz visualization.

to_graph(arg1)

Returns the underlying Graph.t() (libgraph) for direct traversal.

Types

graph()

@type graph() :: struct()

A program dependence graph. Built by string_to_graph/2, file_to_graph/2, etc.

node_id()

@type node_id() :: Reach.IR.Node.id()

Functions

ast_to_graph(ast, opts \\ [])

@spec ast_to_graph(
  Macro.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Builds a graph from an already-parsed Elixir AST.

Useful when you already have the AST (e.g. from Credo or ExDNA) and don't want to re-parse source.

backward_slice(system_dependence, node_id)

@spec backward_slice(graph(), node_id()) :: [term()]

Returns all node IDs that affect the given node (backward slice).

The backward slice answers: "what does this expression depend on?"

call_graph(system_dependence)

@spec call_graph(graph()) :: Graph.t()

Returns the call graph as a Graph.t().

Vertices are {module, function, arity} tuples.

canonical_order(graph, block_node_id)

@spec canonical_order(graph(), Reach.IR.Node.id()) :: [
  {Reach.IR.Node.id(), Reach.IR.Node.t()}
]

Returns the children of a block in canonical order.

Independent sibling expressions are sorted by structural hash so that reordered-but-equivalent blocks produce the same sequence. Dependent expressions preserve their relative order.

Returns a list of {node_id, ir_node} pairs.

Example

# These two blocks produce the same canonical order:
#   a = 1; b = 2; c = a + b
#   b = 2; a = 1; c = a + b
# Because a=1 and b=2 are independent, they get sorted,
# while c=a+b stays last (depends on both).

chop(graph, source, sink)

@spec chop(graph(), node_id(), node_id()) :: [node_id()]

Returns node IDs on all paths from source to sink.

The chop answers: "how does A influence B?"

classify_effect(node)

@spec classify_effect(Reach.IR.Node.t()) :: Reach.Effects.effect()

Returns the effect classification of a node.

Possible values: :pure, :read, :write, :io, :send, :receive, :exception, :nif, :unknown.

compiled_to_graph(source_or_modules, opts \\ [])

@spec compiled_to_graph(
  String.t() | [{module(), binary()}],
  keyword()
) :: {:ok, graph()} | {:error, term()}

Compiles an Elixir source string and builds a graph from the expanded bytecode.

Unlike string_to_graph/2, this compiles the code first, so macro-expanded constructs (try/rescue inside macros, use callbacks, etc.) are visible.

The source must define complete modules.

context_sensitive_slice(graph, node_id)

@spec context_sensitive_slice(graph(), Reach.IR.Node.id()) :: [Reach.IR.Node.id()]

Performs a context-sensitive backward slice through call boundaries.

Uses the Horwitz-Reps-Binkley two-phase algorithm to avoid impossible paths through call sites.

control_deps(system_dependence, node_id)

@spec control_deps(graph(), Reach.IR.Node.id()) :: [{Reach.IR.Node.id(), term()}]

Returns the control dependencies of a node.

Each entry is {controller_node_id, label}.

controls?(graph, controller_id, target_id)

@spec controls?(graph(), node_id(), node_id()) :: boolean()

Returns true if controller has a control-dependence edge to target.

data_deps(system_dependence, node_id)

@spec data_deps(graph(), Reach.IR.Node.id()) :: [{Reach.IR.Node.id(), atom()}]

Returns the data dependencies of a node.

Each entry is {source_node_id, variable_name}.

data_flows?(graph, source_id, sink_id)

Returns true if there's a data-dependence path from source to sink.

dead_code(graph)

@spec dead_code(graph()) :: [Reach.IR.Node.t()]

Returns nodes whose values are never used and have no side effects.

A node is dead if:

It is pure (no side effects)
No observable output depends on it (return values or effectful calls)

depends?(graph, id_a, id_b)

@spec depends?(graph(), node_id(), node_id()) :: boolean()

Returns true if there's any dependence path between two nodes.

edges(arg1)

@spec edges(graph()) :: [Graph.Edge.t()]

Returns all dependence edges in the graph.

file_to_graph(path, opts \\ [])

@spec file_to_graph(
  Path.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Reads a source file and builds a program dependence graph.

The language is auto-detected from the file extension (.ex / .exs for Elixir, .erl / .hrl for Erlang), or can be set explicitly via the :language option.

Returns {:ok, graph} or {:error, reason}.

file_to_graph!(path, opts \\ [])

@spec file_to_graph!(
  Path.t(),
  keyword()
) :: graph()

Same as file_to_graph/2 but raises on error.

forward_slice(system_dependence, node_id)

@spec forward_slice(graph(), node_id()) :: [term()]

Returns all node IDs affected by the given node (forward slice).

The forward slice answers: "what does this expression influence?"

function_graph(graph, function_id)

@spec function_graph(graph(), {module() | nil, atom(), non_neg_integer()}) ::
  graph() | nil

Returns the per-function PDG for a {module, function, arity} tuple.

has_dependents?(graph, node_id)

@spec has_dependents?(graph(), node_id()) :: boolean()

Returns true if the node has data dependents (its value is used elsewhere).

independent?(sdg, id_x, id_y)

@spec independent?(graph(), node_id(), node_id()) :: boolean()

Returns true if two expressions are independent.

Two expressions are independent when:

No data flows between them in either direction
They execute under the same conditions (same control dependencies)
Their side effects don't conflict

Independent expressions can be safely reordered.

module_to_graph(module, opts \\ [])

@spec module_to_graph(
  module(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Builds a graph from a compiled module (loaded in the VM).

Analyzes the macro-expanded Erlang abstract forms from the BEAM bytecode. This captures code injected by use, defmacro, and other macros that the source-level frontend misses.

Requires the module to be compiled with debug info (the default).

neighbors(graph, node_id, label \\ nil)

@spec neighbors(graph(), Reach.IR.Node.id(), term() | nil) :: [Reach.IR.Node.id()]

Returns node IDs directly connected to node_id.

With no label filter, returns all neighbors (both incoming and outgoing). With a label, returns only neighbors connected by edges with that label.

# All direct neighbors
Reach.neighbors(graph, node_id)

# Only nodes connected by :state_read edges
Reach.neighbors(graph, node_id, :state_read)

# Only data dependencies
Reach.neighbors(graph, node_id, {:data, :x})

node(system_dependence, id)

@spec node(graph(), Reach.IR.Node.id()) :: Reach.IR.Node.t() | nil

Returns the IR node for a given ID, or nil.

nodes(graph, opts \\ [])

@spec nodes(
  graph(),
  keyword()
) :: [Reach.IR.Node.t()]

Returns all IR nodes, optionally filtered.

Options

:type — filter by node type (:call, :match, :var, etc.)
:module — filter calls by module
:function — filter calls by function name
:arity — filter by arity

Examples

Reach.nodes(graph, type: :call)
Reach.nodes(graph, type: :call, module: Enum)
Reach.nodes(graph, type: :call, module: Enum, function: :map, arity: 2)

passes_through?(graph, source_id, sink_id, predicate)

@spec passes_through?(graph(), node_id(), node_id(), (Reach.IR.Node.t() -> boolean())) ::
  boolean()

Returns true if the path from source to sink passes through any node matching predicate.

Useful for taint analysis — check if sanitization occurs between a source and sink.

pure?(node)

@spec pure?(Reach.IR.Node.t()) :: boolean()

Returns true if the node is pure (no side effects).

string_to_graph(source, opts \\ [])

@spec string_to_graph(
  String.t(),
  keyword()
) :: {:ok, graph()} | {:error, term()}

Parses a source string and builds a program dependence graph.

Returns {:ok, graph} or {:error, reason}.

Options

:language — :elixir (default) or :erlang
:file — filename for source locations (default: "nofile")
:module — module name for call graph resolution

string_to_graph!(source, opts \\ [])

@spec string_to_graph!(
  String.t(),
  keyword()
) :: graph()

Same as string_to_graph/2 but raises on parse error.

taint_analysis(graph, opts)

@spec taint_analysis(
  graph(),
  keyword()
) :: [map()]

Finds data flow paths from taint sources to dangerous sinks.

Returns a list of %{source: node, sink: node, path: [node_id], sanitized: boolean} for each source→sink pair where data flows.

Sources, sinks, and sanitizers can be specified as keyword filters (same format as nodes/2) or as predicate functions.

Options

:sources — keyword filter or predicate identifying taint sources
:sinks — keyword filter or predicate identifying dangerous sinks
:sanitizers — keyword filter or predicate identifying sanitization points (optional)

Examples

Reach.taint_analysis(graph,
  sources: [type: :call, function: :get_param],
  sinks: [type: :call, module: System, function: :cmd, arity: 2],
  sanitizers: [type: :call, function: :sanitize]
)

# Predicates also work
Reach.taint_analysis(graph,
  sources: &(&1.meta[:function] in [:params, :body_params]),
  sinks: [type: :call, module: Ecto.Adapters.SQL]
)

to_dot(arg1)

@spec to_dot(graph()) :: {:ok, String.t()}

Exports the graph to DOT format for Graphviz visualization.

to_graph(arg1)

@spec to_graph(graph()) :: Graph.t()

Returns the underlying Graph.t() (libgraph) for direct traversal.

Use this when you need graph operations that Reach doesn't expose — path finding, subgraphs, BFS/DFS, topological sort, etc.

raw = Reach.to_graph(graph)
Graph.vertices(raw) |> length()
Graph.get_shortest_path(raw, id_a, id_b)