Broadcast join support for distributed queries.
When joining a large partitioned dataset with a small dimension table, the dimension table is serialized as Arrow IPC, broadcast to all workers, and registered as a temp table. Each worker then joins its data partition against the local copy of the dimension table.
This is the star-schema pattern — the most common distributed join.
Usage
# Automatic: coordinator detects small right side
Dux.Remote.Coordinator.execute(
fact_table
|> Dux.join(dim_table, on: :id),
workers: workers
)
# Explicit: force broadcast of the right side
Dux.Remote.Broadcast.execute(
fact_pipeline,
dim_dux,
on: :region_id,
workers: workers
)
Summary
Functions
Execute a broadcast join: broadcast the small (right) table to all workers, then each worker joins its partition of the large (left) table locally.
Check if a table is small enough to broadcast.
Functions
Execute a broadcast join: broadcast the small (right) table to all workers, then each worker joins its partition of the large (left) table locally.
Options
:workers— list of worker PIDs (default: all from:pg):on— join column(s) (required):how— join type (default::inner):broadcast_name— name for the broadcast table on workers (auto-generated if nil):timeout— per-worker timeout (default::infinity)
Returns a %Dux{} with the merged join result.
Check if a table is small enough to broadcast.
Computes the table and checks its serialized Arrow IPC size against the threshold.