Dux.Remote.Broadcast (Dux v0.2.0)

Copy Markdown View Source

Broadcast join support for distributed queries.

When joining a large partitioned dataset with a small dimension table, the dimension table is serialized as Arrow IPC, broadcast to all workers, and registered as a temp table. Each worker then joins its data partition against the local copy of the dimension table.

This is the star-schema pattern — the most common distributed join.

Usage

# Automatic: coordinator detects small right side
Dux.Remote.Coordinator.execute(
  fact_table
  |> Dux.join(dim_table, on: :id),
  workers: workers
)

# Explicit: force broadcast of the right side
Dux.Remote.Broadcast.execute(
  fact_pipeline,
  dim_dux,
  on: :region_id,
  workers: workers
)

Summary

Functions

Execute a broadcast join: broadcast the small (right) table to all workers, then each worker joins its partition of the large (left) table locally.

Check if a table is small enough to broadcast.

Functions

execute(left, right, opts \\ [])

Execute a broadcast join: broadcast the small (right) table to all workers, then each worker joins its partition of the large (left) table locally.

Options

  • :workers — list of worker PIDs (default: all from :pg)
  • :on — join column(s) (required)
  • :how — join type (default: :inner)
  • :broadcast_name — name for the broadcast table on workers (auto-generated if nil)
  • :timeout — per-worker timeout (default: :infinity)

Returns a %Dux{} with the merged join result.

should_broadcast?(dux, threshold \\ 268_435_456)

Check if a table is small enough to broadcast.

Computes the table and checks its serialized Arrow IPC size against the threshold.