Ragex Graph Algorithms
View SourceThis document covers the graph algorithms available in Ragex for analyzing code structure, relationships, and importance metrics.
Table of Contents
- Overview
- PageRank
- Path Finding
- Centrality Metrics
- Graph Statistics
- Usage Examples
- Performance Characteristics
Overview
Ragex implements several graph algorithms for analyzing code relationships:
| Algorithm | Purpose | Use Cases |
|---|---|---|
| PageRank | Importance scoring | Finding central/critical functions |
| Path Finding | Dependency chains | Understanding call flows, impact analysis |
| Degree Centrality | Connection metrics | Identifying highly-coupled code |
| Graph Stats | Overall analysis | Codebase health, complexity assessment |
All algorithms operate on the call graph built from code analysis.
PageRank
What It Does
PageRank measures the importance of functions and modules based on who calls them. Functions called by many other functions score higher.
Algorithm
Based on Google's PageRank algorithm:
- Iterative computation with damping factor
- Higher scores = more important nodes
- Considers both incoming edges and importance of callers
Usage
alias Ragex.Graph.Algorithms
# Basic usage (with defaults)
scores = Algorithms.pagerank()
# Returns: %{
# {:function, :ModuleA, :foo, 0} => 0.234,
# {:function, :ModuleB, :bar, 1} => 0.189,
# ...
# }Options
Algorithms.pagerank(
damping_factor: 0.85, # Probability of following edges (default: 0.85)
max_iterations: 100, # Maximum iterations (default: 100)
tolerance: 0.0001 # Convergence threshold (default: 0.0001)
)Parameters Explained
damping_factor (0.85):
- Probability a random walker follows an edge vs. teleporting
- Higher = more weight on graph structure
- Lower = more equal distribution
- Range: 0.0 to 1.0
- Typical values: 0.85 (Google), 0.5 (more distributed)
max_iterations (100):
- Maximum number of iterations before stopping
- Prevents infinite loops
- Usually converges in 10-50 iterations
- Increase for very large graphs
tolerance (0.0001):
- Convergence threshold
- Stops when maximum score change < tolerance
- Lower = more precise, slower
- Higher = less precise, faster
Interpretation
High PageRank (>0.5):
- Critical functions called by many others
- Central to the codebase
- Changes here have wide impact
- Examples: utility functions, core APIs
Medium PageRank (0.1-0.5):
- Moderately important functions
- Called by several others
- Standard application logic
Low PageRank (<0.1):
- Leaf functions (few callers)
- Application entry points
- Specialized utilities
Example Results
scores = Algorithms.pagerank()
# Get top 10 most important functions
top = scores
|> Enum.sort_by(fn {_id, score} -> -score end)
|> Enum.take(10)
# Results might look like:
# [
# {{:function, :Utils, :parse_json, 1}, 0.456}, # Called everywhere
# {{:function, :Core, :process, 2}, 0.389}, # Central processor
# {{:function, :DB, :query, 1}, 0.234}, # Database access
# ...
# ]Performance
- Time Complexity: O(iterations × edges)
- Space Complexity: O(nodes)
- Typical Runtime: <100ms for 1,000 nodes
Path Finding
What It Does
Finds all paths between two nodes in the call graph. Useful for understanding how one function reaches another through calls.
Algorithm
Depth-first search (DFS) with:
- Cycle detection (via visited set)
- Depth limiting
- Path count limiting (Phase 4D)
- Early stopping when limits reached
- Dense graph warnings
Basic Usage
alias Ragex.Graph.Algorithms
# Find paths from function A to function B
paths = Algorithms.find_paths(
{:function, :ModuleA, :foo, 0},
{:function, :ModuleC, :baz, 0}
)
# Returns: [
# [{:function, :ModuleA, :foo, 0}, {:function, :ModuleB, :bar, 0}, {:function, :ModuleC, :baz, 0}],
# [{:function, :ModuleA, :foo, 0}, {:function, :ModuleD, :qux, 0}, {:function, :ModuleC, :baz, 0}]
# ]Options
Algorithms.find_paths(from, to,
max_depth: 10, # Maximum path length in edges (default: 10)
max_paths: 100, # Maximum paths to return (default: 100)
warn_dense: true # Emit warnings for dense graphs (default: true)
)Parameters Explained
max_depth (10):
- Maximum path length measured in edges (hops)
- Path with 3 nodes has 2 edges
- Prevents searching too deep
- Typical values:
- 5: Direct dependencies only
- 10: Standard (covers most practical cases)
- 20: Deep analysis (may be slow)
max_paths (100):
- Maximum number of paths to return
- Prevents exponential explosion in dense graphs
- Stops DFS early when reached
- Typical values:
- 10: Quick analysis
- 100: Standard (Phase 4D default)
- 1000: Exhaustive (may hang on dense graphs)
warn_dense (true):
- Automatically warn about dense graphs
- Checks starting node's out-degree
- Helpful for understanding performance
- Set to
falsein automated systems
Dense Graph Warnings
The system automatically detects and warns about potentially slow operations:
INFO Level (≥10 edges):
Moderately connected node: {:function, :ModuleA, :foo, 0} has 12 outgoing edges.
Path finding may take some time.WARNING Level (≥20 edges):
Dense graph detected: Node {:function, :HubModule, :central, 0} has 25 outgoing edges.
Path finding may be slow or return partial results. Consider reducing max_depth or max_paths.Examples
1. Finding Direct Dependencies
# Find immediate calls (depth 1)
paths = Algorithms.find_paths(from, to, max_depth: 1)2. Quick Analysis (Limited Paths)
# Get just a few example paths
paths = Algorithms.find_paths(from, to, max_paths: 10)3. Deep Dependency Analysis
# Explore deeper relationships (careful with dense graphs!)
paths = Algorithms.find_paths(from, to, max_depth: 15, max_paths: 200)4. Silent Operation (No Warnings)
# For automated tools
paths = Algorithms.find_paths(from, to, warn_dense: false)5. Checking if Path Exists
# Quick check
case Algorithms.find_paths(from, to, max_paths: 1) do
[] -> :no_path
[_path] -> :has_path
endInterpreting Results
Empty List []:
- No path exists from source to target
- Functions are independent
- Target not reachable through calls
Single Path:
- Direct or unique dependency chain
- Clear relationship
Multiple Paths:
- Function is reachable through different routes
- May indicate coupling or complex dependencies
- Consider refactoring if count is very high
Truncated Results (= max_paths):
- Many more paths likely exist
- Dense graph structure
- Consider:
- Reducing max_depth
- Refactoring to reduce coupling
- Increasing max_paths if needed
Performance
| Scenario | Complexity | Typical Time |
|---|---|---|
| Sparse graph | O(V + E) | <10ms |
| Moderate graph | O(V × D) | 10-100ms |
| Dense graph (no limit) | O(V^D) | Hang risk! |
| Dense graph (with limit) | O(max_paths × D) | <200ms |
V = vertices (nodes), E = edges, D = max_depth
Known Limitations
- Non-deterministic order: Path order may vary between runs (DFS traversal order)
- No truncation indicator: Can't tell if results are complete or truncated
- Path quality: All paths treated equally (no shortest-path preference)
- Memory usage: Storing many long paths can consume significant memory
Centrality Metrics
What It Does
Computes connection-based metrics for all nodes in the graph.
Metrics
In-Degree:
- Number of incoming edges (callers)
- Functions with high in-degree are called by many others
- Indicates reusability or coupling
Out-Degree:
- Number of outgoing edges (callees)
- Functions with high out-degree call many others
- Indicates complexity or coordination role
Total Degree:
- Sum of in-degree and out-degree
- Overall connectivity metric
Usage
centrality = Algorithms.degree_centrality()
# Returns: %{
# {:function, :ModuleA, :foo, 0} => %{
# in_degree: 0, # Not called by anyone
# out_degree: 5, # Calls 5 other functions
# total_degree: 5
# },
# {:function, :Utils, :helper, 1} => %{
# in_degree: 12, # Called by 12 functions
# out_degree: 2, # Calls 2 functions
# total_degree: 14
# },
# ...
# }Interpretation
High In-Degree (>10)
- Utility functions: Reused across codebase
- Core APIs: Central to application logic
- Potential issues: Single point of failure, change impact
High Out-Degree (>10)
- Coordinator functions: Orchestrate multiple operations
- Complex logic: May need refactoring
- Dense nodes: Path finding may be slow
High Total Degree (>20)
- Hub nodes: Central to the graph
- Critical code: Changes affect many paths
- Refactoring target: Consider splitting
Example Analysis
centrality = Algorithms.degree_centrality()
# Find functions called by many (utilities)
utilities = centrality
|> Enum.filter(fn {_id, metrics} -> metrics.in_degree > 10 end)
|> Enum.sort_by(fn {_id, metrics} -> -metrics.in_degree end)
# Find complex coordinators
coordinators = centrality
|> Enum.filter(fn {_id, metrics} -> metrics.out_degree > 15 end)
|> Enum.sort_by(fn {_id, metrics} -> -metrics.out_degree end)Graph Statistics
What It Does
Provides comprehensive overview of the entire codebase graph structure.
Usage
stats = Algorithms.graph_stats()
# Returns: %{
# node_count: 1234,
# node_counts_by_type: %{
# function: 1000,
# module: 100,
# call: 3000
# },
# edge_count: 3000,
# average_degree: 4.86,
# density: 0.0024,
# top_nodes: [
# {{:function, :Utils, :parse, 1}, 0.234},
# {{:function, :Core, :run, 0}, 0.189},
# ...
# ]
# }Metrics Explained
node_count:
- Total number of nodes (modules, functions, calls)
- Larger = bigger codebase
node_counts_by_type:
- Breakdown by entity type
- Useful for understanding code composition
edge_count:
- Total number of call relationships
- Indicates coupling level
average_degree:
- Average connections per node
- Higher = more coupled
- Typical values:
- 2-4: Well-structured, modular
- 5-10: Moderate coupling
10: High coupling, consider refactoring
density:
- Ratio of actual to possible edges
- Range: 0.0 (no edges) to 1.0 (fully connected)
- Typical values:
- <0.01: Sparse, well-structured
- 0.01-0.05: Moderate
0.05: Dense, potential issues
top_nodes:
- Top 10 functions by PageRank
- Most important/central functions
Example Interpretation
stats = Algorithms.graph_stats()
IO.puts("Codebase Analysis:")
IO.puts(" Total functions: #{stats.node_counts_by_type[:function]}")
IO.puts(" Total calls: #{stats.edge_count}")
IO.puts(" Coupling level: #{stats.average_degree}")
# Health check
cond do
stats.average_degree < 4 ->
IO.puts("✓ Well-structured codebase")
stats.average_degree < 8 ->
IO.puts("⚠ Moderate coupling, consider modularization")
true ->
IO.puts("⚠ High coupling, refactoring recommended")
endUsage Examples
1. Find Critical Functions
# Combine PageRank and centrality
scores = Algorithms.pagerank()
centrality = Algorithms.degree_centrality()
critical_functions = scores
|> Enum.filter(fn {id, score} ->
metrics = Map.get(centrality, id, %{in_degree: 0})
score > 0.2 and metrics.in_degree > 10
end)
|> Enum.sort_by(fn {_id, score} -> -score end)
IO.puts("Critical functions that need tests:")
for {id, score} <- critical_functions do
IO.puts(" #{inspect(id)} - score: #{Float.round(score, 3)}")
end2. Impact Analysis
# Find all functions affected by a change
changed_function = {:function, :Core, :process, 2}
# Get reverse dependencies (who calls this?)
centrality = Algorithms.degree_centrality()
{:ok, metrics} = Map.fetch(centrality, changed_function)
IO.puts("Direct impact: #{metrics.in_degree} callers")
# Check transitivity by finding paths to entry points
entry_points = find_entry_points() # Your function
for entry <- entry_points do
paths = Algorithms.find_paths(changed_function, entry, max_paths: 10)
unless Enum.empty?(paths) do
IO.puts(" Affects: #{inspect(entry)} (#{length(paths)} paths)")
end
end3. Detect Code Smells
centrality = Algorithms.degree_centrality()
# Find "God functions" (too many responsibilities)
god_functions = centrality
|> Enum.filter(fn {_id, m} -> m.out_degree > 20 end)
# Find "hub functions" (too many dependencies)
hub_functions = centrality
|> Enum.filter(fn {_id, m} -> m.in_degree > 30 end)
# Find isolated modules
isolated = centrality
|> Enum.filter(fn {_id, m} -> m.total_degree == 0 end)
IO.puts("Code health report:")
IO.puts(" God functions: #{length(god_functions)}")
IO.puts(" Hub functions: #{length(hub_functions)}")
IO.puts(" Isolated entities: #{length(isolated)}")4. Dependency Chain Visualization
# Find all paths and visualize
from = {:function, :API, :handle_request, 1}
to = {:function, :DB, :query, 2}
paths = Algorithms.find_paths(from, to, max_depth: 5)
IO.puts("Request flows from API to Database:")
for path <- paths do
IO.puts("\n " <> Enum.map_join(path, " -> ", &format_node/1))
end
defp format_node({:function, module, name, arity}) do
"#{module}.#{name}/#{arity}"
end5. Codebase Evolution Tracking
# Compare stats over time
stats_before = Algorithms.graph_stats()
# ... make changes ...
stats_after = Algorithms.graph_stats()
IO.puts("Changes:")
IO.puts(" Functions: #{stats_before.node_counts_by_type[:function]} → #{stats_after.node_counts_by_type[:function]}")
IO.puts(" Coupling: #{stats_before.average_degree} → #{stats_after.average_degree}")
IO.puts(" Density: #{stats_before.density} → #{stats_after.density}")
if stats_after.average_degree < stats_before.average_degree do
IO.puts("✓ Coupling reduced - good refactoring!")
else
IO.puts("⚠ Coupling increased - review changes")
endPerformance Characteristics
Computational Complexity
| Algorithm | Time Complexity | Space Complexity | Typical Runtime (1K nodes) |
|---|---|---|---|
| PageRank | O(I × E) | O(V) | 50-100ms |
| Path Finding (limited) | O(max_paths × D) | O(max_paths × D) | 10-200ms |
| Path Finding (unlimited) | O(V^D) | O(V^D) | Risk of hang! |
| Degree Centrality | O(V + E) | O(V) | <10ms |
| Graph Stats | O(V + E + I × E) | O(V) | 100-150ms |
Notation:
- V = vertices (nodes)
- E = edges
- I = PageRank iterations
- D = max_depth
Memory Usage
| Operation | Memory Footprint | Notes |
|---|---|---|
| PageRank | ~100 bytes/node | Score storage |
| Path Finding | ~1KB/path | Path list storage |
| Centrality | ~200 bytes/node | Three metrics per node |
| Graph Stats | ~500 bytes | Aggregated results |
Optimization Tips
For Large Graphs (>10K nodes):
Use limited path finding:
paths = Algorithms.find_paths(from, to, max_paths: 50, max_depth: 8)Cache PageRank results:
scores = Algorithms.pagerank() # Store in ETS or process state for reuseFilter before computing:
# Only compute for functions (not all nodes) functions = Store.list_nodes(:function) # Then run algorithms on subsetUse lower precision:
Algorithms.pagerank(tolerance: 0.001) # Faster convergence
For Dense Graphs (high degree):
Always use max_paths limit:
paths = Algorithms.find_paths(from, to, max_paths: 100) # Never unlimited!Reduce depth:
paths = Algorithms.find_paths(from, to, max_depth: 5)Disable warnings in loops:
for target <- targets do Algorithms.find_paths(from, target, warn_dense: false) end
Best Practices
1. Always Use Limits
❌ Don't:
# Dangerous on dense graphs!
paths = Algorithms.find_paths(from, to, max_paths: 10_000)✅ Do:
# Safe default limits
paths = Algorithms.find_paths(from, to) # Uses max_paths: 1002. Check Graph Health First
stats = Algorithms.graph_stats()
if stats.average_degree > 15 do
IO.puts("Warning: Dense graph detected. Using conservative limits.")
opts = [max_paths: 50, max_depth: 5]
else
opts = [] # Use defaults
end
paths = Algorithms.find_paths(from, to, opts)3. Interpret Results in Context
# Don't just look at numbers - understand the domain
scores = Algorithms.pagerank()
for {{:function, module, name, _arity}, score} <- scores do
if score > 0.3 and module != :Utils do
# High score but not a utility - may indicate tight coupling
IO.puts("Review: #{module}.#{name} has high PageRank but isn't a utility")
end
end4. Combine Multiple Metrics
# Use multiple algorithms for better insights
scores = Algorithms.pagerank()
centrality = Algorithms.degree_centrality()
for {{:function, _m, _n, _a} = id, score} <- scores do
metrics = Map.get(centrality, id)
# High PageRank + Low in-degree = Entry point
if score > 0.2 and metrics.in_degree < 2 do
IO.puts("Entry point: #{inspect(id)}")
end
# High PageRank + High in-degree = Critical utility
if score > 0.3 and metrics.in_degree > 10 do
IO.puts("Critical utility: #{inspect(id)}")
end
endReferences
Related Documentation:
- README.md - Project overview
- PERSISTENCE.md - Caching and persistence
- CONFIGURATION.md - Model and cache configuration