Graph analytics on Dux dataframes.
A graph is two Dux structs — vertices and edges. Every graph algorithm reduces to joins, aggregations, and iterations on those tables. DuckDB provides native acceleration via recursive CTEs for path algorithms.
Creating a graph
vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
graph = Dux.Graph.new(vertices: vertices, edges: edges)Algorithms
All algorithms are compositions of Dux verbs — they return %Dux{} structs
that you can pipe into further operations.
graph
|> Dux.Graph.pagerank()
|> Dux.sort_by(desc: :rank)
|> Dux.head(10)
|> Dux.to_rows()
Summary
Functions
Compute approximate betweenness centrality via Brandes' algorithm.
Find connected components using iterative label propagation.
Compute the total degree (in-degree + out-degree) of each vertex.
Mark a graph for distributed execution across the given workers.
Return the number of edges in the graph. Triggers computation.
Create a graph from an edge list, inferring vertices from unique source and destination nodes.
Compute the in-degree of each vertex.
Create a new graph from vertex and edge dataframes.
Compute the out-degree of each vertex.
Compute PageRank scores for all vertices.
Find the shortest path distance from a source vertex to all reachable vertices.
Count the number of triangles in the graph.
Return the number of vertices in the graph. Triggers computation.
algorithms
Detect communities using label propagation.
Types
Functions
Compute approximate betweenness centrality via Brandes' algorithm.
Betweenness centrality measures how often a vertex lies on shortest paths between other vertices. High-BC vertices are "bridges" in the network.
Uses a sample of source vertices for efficiency. The result is normalized
by 2 / ((n-1)(n-2)) for undirected graphs so values fall in [0, 1].
Returns a %Dux{} with columns [vertex_id, bc].
Options
:sample— number of source vertices to sample (default:min(n, 100)):seed— random seed for reproducible sampling (default: random):normalize— whether to normalize scores (default:true)
Examples
vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([
%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
%{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2}
])
graph = Dux.Graph.new(vertices: vertices, edges: edges)
Dux.Graph.betweenness_centrality(graph) |> Dux.sort_by(desc: :bc) |> Dux.to_rows()
Find connected components using iterative label propagation.
Each vertex is assigned a component ID (the minimum vertex ID in its component).
Returns a %Dux{} with columns [vertex_id, component].
Options
:max_iterations- maximum propagation iterations (default:100):workers- list of worker PIDs for distributed execution (default:nilfor local). When provided, uses the broadcast-iterate pattern: broadcasts labels and edges to workers each iteration.
Examples
iex> edges = Dux.from_list([
...> %{"src" => 1, "dst" => 2},
...> %{"src" => 2, "dst" => 1},
...> %{"src" => 3, "dst" => 4},
...> %{"src" => 4, "dst" => 3}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}, %{"id" => 4}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.connected_components(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
iex> result["component"]
[1, 1, 3, 3]
Compute the total degree (in-degree + out-degree) of each vertex.
Returns a %Dux{} with columns [vertex_id, degree]. Each edge contributes
1 to both the source vertex's degree and the destination vertex's degree.
Examples
iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"degree" => [1, 2, 1], "id" => [1, 2, 3]}
Mark a graph for distributed execution across the given workers.
All subsequent algorithms will automatically use the distributed path.
graph = Dux.Graph.new(vertices: v, edges: e)
|> Dux.Graph.distribute(workers)
graph |> Dux.Graph.pagerank() # automatically distributed
Return the number of edges in the graph. Triggers computation.
Examples
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}]), edges: Dux.from_list([%{"src" => 1, "dst" => 2}]))
iex> Dux.Graph.edge_count(graph)
1
Create a graph from an edge list, inferring vertices from unique source and destination nodes.
This is a convenience constructor when you only have edges. Vertices are
derived by taking the union of all distinct src and dst values.
Options
:edge_src- column name for edge source (default::src):edge_dst- column name for edge destination (default::dst):vertex_id- column name for the inferred vertex ID (default::id)
Examples
iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.from_edgelist(edges)
iex> Dux.Graph.vertex_count(graph)
3
Compute the in-degree of each vertex.
Returns a %Dux{} with columns [vertex_id, in_degree].
Examples
iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 1, "dst" => 3}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.in_degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"id" => [2, 3], "in_degree" => [1, 2]}
Create a new graph from vertex and edge dataframes.
Options
:vertex_id- column name for vertex IDs (default::id):edge_src- column name for edge source (default::src):edge_dst- column name for edge destination (default::dst)
Examples
vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 3}])
graph = Dux.Graph.new(vertices: vertices, edges: edges)
Compute the out-degree of each vertex.
Returns a %Dux{} with columns [vertex_id, out_degree].
Examples
iex> edges = Dux.from_list([%{"src" => 1, "dst" => 2}, %{"src" => 1, "dst" => 3}, %{"src" => 2, "dst" => 3}])
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}]), edges: edges)
iex> Dux.Graph.out_degree(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
%{"id" => [1, 2], "out_degree" => [2, 1]}
Compute PageRank scores for all vertices.
Returns a %Dux{} with columns [vertex_id, rank].
Options
:damping- damping factor (default:0.85):iterations- number of iterations (default:20):workers- list of worker PIDs for distributed execution (default:nilfor local)
When workers is provided, uses the broadcast-iterate pattern: each iteration
broadcasts current ranks to all workers, workers compute local contributions,
coordinator merges.
Examples
iex> edges = Dux.from_list([
...> %{"src" => 1, "dst" => 2},
...> %{"src" => 2, "dst" => 3},
...> %{"src" => 3, "dst" => 1},
...> %{"src" => 3, "dst" => 2}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.pagerank(graph) |> Dux.sort_by(:id) |> Dux.to_rows()
iex> length(result) == 3
true
iex> Enum.all?(result, fn row -> row["rank"] > 0 end)
true
Find the shortest path distance from a source vertex to all reachable vertices.
Uses DuckDB's USING KEY recursive CTEs for efficient graph traversal with
automatic deduplication — only the shortest distance per node is kept. Returns
a %Dux{} with columns [node, dist].
Options
:max_depth— maximum traversal depth (default:1000):weight— edge column to use as weight (default:nilfor unweighted BFS). When set, computes weighted shortest paths (Bellman-Ford via SQL).
Examples
iex> edges = Dux.from_list([
...> %{"src" => 1, "dst" => 2},
...> %{"src" => 2, "dst" => 3},
...> %{"src" => 1, "dst" => 3}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> Dux.Graph.shortest_paths(graph, 1) |> Dux.sort_by(:node) |> Dux.to_columns()
%{"dist" => [0, 1, 1], "node" => [1, 2, 3]}
Count the number of triangles in the graph.
A triangle is a set of three vertices where each pair is connected by an edge. Edges must be bidirectional for triangle detection (i.e., if vertex A connects to B, there must also be an edge from B to A). Returns an integer count.
When the graph has workers set via distribute/2, the computation runs on a
remote worker node. The full edge set is broadcast to one worker and the
triple self-join executes there, keeping the coordinator free.
Examples
iex> edges = Dux.from_list([
...> %{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
...> %{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2},
...> %{"src" => 1, "dst" => 3}, %{"src" => 3, "dst" => 1}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> Dux.Graph.triangle_count(graph)
1
Return the number of vertices in the graph. Triggers computation.
Examples
iex> graph = Dux.Graph.new(vertices: Dux.from_list([%{"id" => 1}, %{"id" => 2}]), edges: Dux.from_list([%{"src" => 1, "dst" => 2}]))
iex> Dux.Graph.vertex_count(graph)
2
algorithms
Detect communities using label propagation.
Each vertex starts with its own label. In each iteration, vertices adopt the most frequent label among their neighbors (tie-broken by smallest label). Converges when labels stabilize.
Returns a %Dux{} with columns [vertex_id, community].
Options
:max_iterations— maximum iterations (default:20)
Examples
iex> edges = Dux.from_list([
...> %{"src" => 1, "dst" => 2}, %{"src" => 2, "dst" => 1},
...> %{"src" => 2, "dst" => 3}, %{"src" => 3, "dst" => 2},
...> %{"src" => 4, "dst" => 5}, %{"src" => 5, "dst" => 4}
...> ])
iex> vertices = Dux.from_list([%{"id" => 1}, %{"id" => 2}, %{"id" => 3}, %{"id" => 4}, %{"id" => 5}])
iex> graph = Dux.Graph.new(vertices: vertices, edges: edges)
iex> result = Dux.Graph.communities(graph) |> Dux.sort_by(:id) |> Dux.to_columns()
iex> [c1, c2, c3, c4, c5] = result["community"]
iex> c1 == c2 and c2 == c3
true
iex> c4 == c5
true
iex> c1 != c4
true