View Source Distributed Execution
This guide explains how to execute Handoff DAGs across multiple Elixir nodes for improved performance and scalability.
prerequisites
Prerequisites
- A working Elixir cluster with connected nodes
- Handoff installed on all nodes
- Basic understanding of Handoff DAGs (see Getting Started)
setting-up-nodes
Setting Up Nodes
Before executing a DAG in a distributed environment, you need to set up your nodes:
# On each node, start Handoff
Handoff.start()
# Register the local node with its capabilities
Handoff.register_local_node(%{
cpu: 8, # 8 CPU cores
memory: 16000, # 16GB memory
gpu: 1 # 1 GPU unit
})
node-discovery
Node Discovery
Handoff provides automatic node discovery within a connected Erlang cluster:
# Discover all nodes in the cluster and register their capabilities
{:ok, discovered_nodes} = Handoff.discover_nodes()
# See what nodes are available and their resources
IO.inspect(discovered_nodes)
# Example output:
# %{
# :"node1@192.168.1.100" => %{cpu: 8, memory: 16000, gpu: 1},
# :"node2@192.168.1.101" => %{cpu: 4, memory: 8000},
# :"node3@192.168.1.102" => %{cpu: 16, memory: 32000, gpu: 2}
# }
resource-aware-functions
Resource-Aware Functions
To take advantage of distributed execution, define functions with resource requirements:
alias Handoff.Function
# CPU-intensive function
cpu_heavy_fn = %Function{
id: :cpu_intensive,
args: [:input_data],
code: &SomeModule.heavy_computation/1,
cost: %{cpu: 4, memory: 2000} # Requires 4 CPU cores, 2GB memory
}
# GPU-accelerated function
gpu_fn = %Function{
id: :gpu_task,
args: [:preprocessed_data],
code: &SomeModule.gpu_computation/1,
cost: %{gpu: 1, memory: 4000} # Requires 1 GPU, 4GB memory
}
distributed-execution
Distributed Execution
Execute your DAG across the cluster using execute_distributed
:
# Build and validate your DAG
dag =
Handoff.new()
|> Handoff.DAG.add_function(source_fn)
|> Handoff.DAG.add_function(cpu_heavy_fn)
|> Handoff.DAG.add_function(gpu_fn)
|> Handoff.DAG.add_function(aggregation_fn)
:ok = Handoff.DAG.validate(dag)
# Execute the DAG across the cluster
{:ok, results} = Handoff.execute_distributed(dag, max_retries: 3)
fault-tolerance
Fault Tolerance
Distributed execution in Handoff includes automatic fault tolerance:
- Task retry on failure (configurable with
max_retries
) - Node failure detection
- Result synchronization across nodes
If a node fails during execution, its tasks will be reassigned to other suitable nodes.