Dux.Graph (Dux v0.2.0)

Copy Markdown View Source

Graph analytics on Dux dataframes.

A graph is two Dux structs — vertices and edges. Every graph algorithm reduces to joins, aggregations, and iterations on those tables. DuckDB provides native acceleration via recursive CTEs for path algorithms.

Creating a graph

vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
graph = Dux.Graph.new(vertices: vertices, edges: edges)

Algorithms

All algorithms are compositions of Dux verbs — they return %Dux{} structs that you can pipe into further operations.

graph
|> Dux.Graph.pagerank()
|> Dux.sort_by(desc: :rank)
|> Dux.head(10)
|> Dux.to_rows()

Summary

Functions

Compute approximate betweenness centrality via Brandes' algorithm.

Find connected components using iterative label propagation.

Compute the total degree (in-degree + out-degree) of each vertex.

Mark a graph for distributed execution across the given workers.

Return the number of edges in the graph. Triggers computation.

Create a graph from an edge list, inferring vertices from unique source and destination nodes.

Compute the in-degree of each vertex.

Create a new graph from vertex and edge dataframes.

Compute the out-degree of each vertex.

Compute PageRank scores for all vertices.

Find the shortest path distance from a source vertex to all reachable vertices.

Count the number of triangles in the graph.

Return the number of vertices in the graph. Triggers computation.

algorithms

Detect communities using label propagation.

Types

t()

@type t() :: %Dux.Graph{
  edge_dst: String.t(),
  edge_src: String.t(),
  edges: Dux.t(),
  vertex_id: String.t(),
  vertices: Dux.t(),
  workers: term()
}

Functions

betweenness_centrality(graph, opts \\ [])

Compute approximate betweenness centrality via Brandes' algorithm.

Betweenness centrality measures how often a vertex lies on shortest paths between other vertices. High-BC vertices are "bridges" in the network.

Uses a sample of source vertices for efficiency. The result is normalized by 2 / ((n-1)(n-2)) for undirected graphs so values fall in [0, 1].

Returns a %Dux{} with columns [vertex_id, bc].

Options

  • :sample — number of source vertices to sample (default: min(n, 100))
  • :seed — random seed for reproducible sampling (default: random)
  • :normalize — whether to normalize scores (default: true)

Examples

vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([
  %{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
  %{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2}
])
graph = Dux.Graph.new(vertices: vertices, edges: edges)
Dux.Graph.betweenness_centrality(graph) |> Dux.sort_by(desc: :bc) |> Dux.to_rows()

connected_components(graph, opts \\ [])

Find connected components using iterative label propagation.

Each vertex is assigned a component ID (the minimum vertex ID in its component). Returns a %Dux{} with columns [vertex_id, component].

Options

  • :max_iterations - maximum propagation iterations (default: 100)
  • :workers - list of worker PIDs for distributed execution (default: nil for local). When provided, uses the broadcast-iterate pattern: broadcasts labels and edges to workers each iteration.

Examples

iex> edges = Dux.from_list([
...>   %{"src" => 1, "dst" => 2},
...>   %{"src" => 2, "dst" => 1},
...>   %{"src" => 3, "dst" => 4},
...>   %{"src" => 4, "dst" => 3}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}, %{"id" => 4}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.connected_components(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
iex> result["component"]
[1, 1, 3, 3]

degree(graph)

Compute the total degree (in-degree + out-degree) of each vertex.

Returns a %Dux{} with columns [vertex_id, degree]. Each edge contributes 1 to both the source vertex's degree and the destination vertex's degree.

Examples

iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"degree" => [1, 2, 1], "id" => [1, 2, 3]}

distribute(graph, workers)

Mark a graph for distributed execution across the given workers.

All subsequent algorithms will automatically use the distributed path.

graph = Dux.Graph.new(vertices: v, edges: e)
        |> Dux.Graph.distribute(workers)

graph |> Dux.Graph.pagerank()  # automatically distributed

edge_count(graph)

Return the number of edges in the graph. Triggers computation.

Examples

iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}]), edges: Dux.from_list([%{"src" => 1, "dst" => 2}]))
iex> Dux.Graph.edge_count(graph)
1

from_edgelist(edges, opts \\ [])

Create a graph from an edge list, inferring vertices from unique source and destination nodes.

This is a convenience constructor when you only have edges. Vertices are derived by taking the union of all distinct src and dst values.

Options

  • :edge_src - column name for edge source (default: :src)
  • :edge_dst - column name for edge destination (default: :dst)
  • :vertex_id - column name for the inferred vertex ID (default: :id)

Examples

iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.from_edgelist(edges)
iex> Dux.Graph.vertex_count(graph)
3

in_degree(graph)

Compute the in-degree of each vertex.

Returns a %Dux{} with columns [vertex_id, in_degree].

Examples

iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 1, "dst" => 3}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.in_degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"id" => [2, 3], "in_degree" => [1, 2]}

new(opts)

Create a new graph from vertex and edge dataframes.

Options

  • :vertex_id - column name for vertex IDs (default: :id)
  • :edge_src - column name for edge source (default: :src)
  • :edge_dst - column name for edge destination (default: :dst)

Examples

vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
graph = Dux.Graph.new(vertices: vertices, edges: edges)

out_degree(graph)

Compute the out-degree of each vertex.

Returns a %Dux{} with columns [vertex_id, out_degree].

Examples

iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 1, "dst" => 3}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.out_degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"id" => [1, 2], "out_degree" => [2, 1]}

pagerank(graph, opts \\ [])

Compute PageRank scores for all vertices.

Returns a %Dux{} with columns [vertex_id, rank].

Options

  • :damping - damping factor (default: 0.85)
  • :iterations - number of iterations (default: 20)
  • :workers - list of worker PIDs for distributed execution (default: nil for local)

When workers is provided, uses the broadcast-iterate pattern: each iteration broadcasts current ranks to all workers, workers compute local contributions, coordinator merges.

Examples

iex> edges = Dux.from_list([
...>   %{"src" => 1, "dst" => 2},
...>   %{"src" => 2, "dst" => 3},
...>   %{"src" => 3, "dst" => 1},
...>   %{"src" => 3, "dst" => 2}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.pagerank(graph) |> Dux.sort_by(:id) |> Dux.to_rows()
iex> length(result) == 3
true
iex> Enum.all?(result, fn row -> row["rank"] > 0 end)
true

shortest_paths(graph, from_vertex, opts \\ [])

Find the shortest path distance from a source vertex to all reachable vertices.

Uses DuckDB's USING KEY recursive CTEs for efficient graph traversal with automatic deduplication — only the shortest distance per node is kept. Returns a %Dux{} with columns [node, dist].

Options

  • :max_depth — maximum traversal depth (default: 1000)
  • :weight — edge column to use as weight (default: nil for unweighted BFS). When set, computes weighted shortest paths (Bellman-Ford via SQL).

Examples

iex> edges = Dux.from_list([
...>   %{"src" => 1, "dst" => 2},
...>   %{"src" => 2, "dst" => 3},
...>   %{"src" => 1, "dst" => 3}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> Dux.Graph.shortest_paths(graph, 1) |> Dux.sort_by(:node) |> Dux.to_columns()
%{"dist" => [0, 1, 1], "node" => [1, 2, 3]}

triangle_count(graph)

Count the number of triangles in the graph.

A triangle is a set of three vertices where each pair is connected by an edge. Edges must be bidirectional for triangle detection (i.e., if vertex A connects to B, there must also be an edge from B to A). Returns an integer count.

When the graph has workers set via distribute/2, the computation runs on a remote worker node. The full edge set is broadcast to one worker and the triple self-join executes there, keeping the coordinator free.

Examples

iex> edges = Dux.from_list([
...>   %{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
...>   %{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2},
...>   %{"src" => 1, "dst" => 3}, %{"src" => 3, "dst" => 1}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> Dux.Graph.triangle_count(graph)
1

vertex_count(graph)

Return the number of vertices in the graph. Triggers computation.

Examples

iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}]), edges: Dux.from_list([%{"src" => 1, "dst" => 2}]))
iex> Dux.Graph.vertex_count(graph)
2

algorithms

communities(graph, opts \\ [])

Detect communities using label propagation.

Each vertex starts with its own label. In each iteration, vertices adopt the most frequent label among their neighbors (tie-broken by smallest label). Converges when labels stabilize.

Returns a %Dux{} with columns [vertex_id, community].

Options

  • :max_iterations — maximum iterations (default: 20)

Examples

iex> edges = Dux.from_list([
...>   %{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
...>   %{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2},
...>   %{"src" => 4, "dst" => 5}, %{"src" => 5, "dst" => 4}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}, %{"id" => 4}, %{"id" => 5}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.communities(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
iex> [c1, c2, c3, c4, c5] = result["community"]
iex> c1 == c2 and c2 == c3
true
iex> c4 == c5
true
iex> c1 != c4
true