View Source Distributed Execution

This guide explains how to execute Handoff DAGs across multiple Elixir nodes for improved performance and scalability.

prerequisites

Prerequisites

  • A working Elixir cluster with connected nodes
  • Handoff installed on all nodes
  • Basic understanding of Handoff DAGs (see Getting Started)

setting-up-nodes

Setting Up Nodes

Before executing a DAG in a distributed environment, you need to set up your nodes:

# On each node, start Handoff
Handoff.start()

# Register the local node with its capabilities
Handoff.register_node(Node.self(), %{
  cpu: 8,        # 8 CPU cores
  memory: 16000, # 16GB memory
  gpu: 1         # 1 GPU unit
})

node-discovery

Node Discovery

Handoff provides automatic node discovery within a connected Erlang cluster:

# Discover all nodes in the cluster and register their capabilities
{:ok, discovered_nodes} = Handoff.discover_nodes()

# See what nodes are available and their resources
IO.inspect(discovered_nodes)
# Example output:
# %{
#   :"node1@192.168.1.100" => %{cpu: 8, memory: 16000, gpu: 1},
#   :"node2@192.168.1.101" => %{cpu: 4, memory: 8000},
#   :"node3@192.168.1.102" => %{cpu: 16, memory: 32000, gpu: 2}
# }

resource-aware-functions

Resource-Aware Functions

To take advantage of distributed execution, define functions with resource requirements:

alias Handoff.Function

# CPU-intensive function
cpu_heavy_fn = %Function{
  id: :cpu_intensive,
  args: [:input_data],
  code: &SomeModule.heavy_computation/1,
  cost: %{cpu: 4, memory: 2000}  # Requires 4 CPU cores, 2GB memory
}

# GPU-accelerated function
gpu_fn = %Function{
  id: :gpu_task,
  args: [:preprocessed_data],
  code: &SomeModule.gpu_computation/1,
  cost: %{gpu: 1, memory: 4000}  # Requires 1 GPU, 4GB memory
}

distributed-execution

Distributed Execution

Execute your DAG across the cluster using execute_distributed:

# Build and validate your DAG
dag =
  Handoff.new()
  |> Handoff.DAG.add_function(source_fn)
  |> Handoff.DAG.add_function(cpu_heavy_fn)
  |> Handoff.DAG.add_function(gpu_fn)
  |> Handoff.DAG.add_function(aggregation_fn)

:ok = Handoff.DAG.validate(dag)

# Execute the DAG across the cluster
{:ok, results} = Handoff.execute_distributed(dag, max_retries: 3)

fault-tolerance

Fault Tolerance

Distributed execution in Handoff includes automatic fault tolerance:

  • Task retry on failure (configurable with max_retries)
  • Node failure detection
  • Result synchronization across nodes

If a node fails during execution, its tasks will be reassigned to other suitable nodes.

example-branching-dag-for-distributed-execution

Example: Branching DAG for Distributed Execution

You can execute more complex, tree-like DAGs across your cluster. For example, suppose you want to preprocess data in two different ways before aggregating the results:

# Install dependency in your mix.exs:
#   {:handoff, "~> 0.1"}

# Handoff requires fully qualified function captures for :code and extra_args.
defmodule Transformations do
  def inc(x), do: x + 1
  def double(x), do: x * 2
  def sum_two_lists(a, b), do: Enum.sum(a) + Enum.sum(b)
end

alias Handoff.Function

dag = Handoff.new()

source_fn = %Function{
  id: :input_data,
  args: [],
  code: &Elixir.Function.identity/1,
  extra_args: [[10, 20, 30]]
}

preprocess_a = %Function{
  id: :pre_a,
  args: [:left],
  code: &Enum.map/2,
  extra_args: [&Transformations.inc/1],
  cost: %{cpu: 2}
}

preprocess_b = %Function{
  id: :right,
  args: [:input_data],
  code: &Enum.map/2,
  extra_args: [&Transformations.double/1],
  cost: %{cpu: 2}
}

aggregate = %Function{
  id: :agg,
  args: [:left, :right],
  code: &Transformations.sum_two_lists/2,
  cost: %{cpu: 1}
}

dag =
  dag
  |> Handoff.DAG.add_function(source_fn)
  |> Handoff.DAG.add_function(preprocess_a)
  |> Handoff.DAG.add_function(preprocess_b)
  |> Handoff.DAG.add_function(aggregate)

:ok = Handoff.DAG.validate(dag)

Note: Handoff requires all function references in :code and extra_args to be fully qualified (e.g., &Module.function/arity), not anonymous functions.

This DAG structure looks like:

graph TD;
  input_data --> left;
  input_data --> right;
  left --> agg;
  right --> agg;