Bridge to ExCubecl for GPU execution via Burn's CubeCL backend.
CubeCL (Compute Unified Backend for Compute Language) is Burn's GPU compute abstraction layer that supports:
- CUDA (NVIDIA GPUs)
- Metal (Apple GPUs — iOS, macOS)
- Vulkan (Android, Linux, Windows)
- WebGPU (Browser-based GPU)
- ROCm (AMD GPUs)
This module delegates to the ExCubecl library (v0.4.0) for all GPU operations. ExCubecl buffers (opaque references) are used for GPU memory throughout.
Usage
# Check if a GPU is available
if ExBurn.CubeclBridge.available?() do
# Initialize the GPU context
{:ok, ctx} = ExBurn.CubeclBridge.init(:metal)
# Check device capabilities
caps = ExBurn.CubeclBridge.device_capabilities(ctx)
# Allocate GPU buffer and run a kernel
{:ok, buf} = ExBurn.CubeclBridge.allocate_gpu(ctx, [4, 4], :f32)
{:ok, result} = ExBurn.CubeclBridge.execute(ctx, :add, [buf, buf])
# Pipeline for multi-kernel execution
{:ok, pid} = ExBurn.CubeclBridge.pipeline()
:ok = ExBurn.CubeclBridge.pipeline_add(pid, :add, [buf, buf], buf)
:ok = ExBurn.CubeclBridge.pipeline_add(pid, :relu, [buf], buf)
{:ok, commands} = ExBurn.CubeclBridge.pipeline_run(pid)
:ok = ExBurn.CubeclBridge.pipeline_free(pid)
end
Summary
Functions
Allocates a GPU buffer with the given shape and type.
Polls the status of an asynchronous command.
Submits a command for asynchronous execution.
Blocks until the given command completes.
Checks whether a GPU device is available via ExCubecl.
Returns a list of available GPU backends on this system.
Returns the data type of a GPU buffer.
Reads the raw binary data from a GPU buffer.
Reads the raw binary data from a GPU buffer, raising on error.
Returns the shape of a GPU buffer.
Returns the size of a GPU buffer in bytes.
Compiles a compute kernel for the given backend.
Destroys the GPU context and frees all associated resources.
Returns the capabilities of the GPU device.
Returns the number of GPU devices available.
Copies data from a GPU buffer to host (CPU) as an Nx tensor.
Executes a compute kernel on the GPU.
Frees a GPU buffer.
Copies data from host (CPU) to a new GPU buffer.
Initializes a GPU compute context for the given backend.
Returns the list of kernel names supported by ExCubecl.
Returns the total available GPU memory (in bytes).
Returns the amount of GPU memory currently in use (in bytes).
Creates a new pipeline for multi-kernel execution.
Adds a kernel command to a pipeline.
Adds a pre-built %ExCubecl.Command{} struct to a pipeline.
Frees a pipeline and its associated resources.
Executes all commands in the pipeline and returns their command IDs.
Returns the list of supported data types.
Synchronizes the GPU context, blocking until all queued operations complete.
Returns the ExCubecl library version string.
Types
@type backend() :: :cuda | :metal | :vulkan | :wgpu | :rocm
@type buffer() :: reference()
@type command_id() :: non_neg_integer()
@type context() :: reference()
@type kernel() :: atom()
@type pipeline_id() :: non_neg_integer()
Functions
@spec allocate_gpu(context(), [non_neg_integer()], atom()) :: {:ok, buffer()} | {:error, String.t()}
Allocates a GPU buffer with the given shape and type.
Returns an ExCubecl buffer reference.
@spec async_poll(command_id()) :: {:ok, :pending | :running | :completed | :failed} | {:error, term()}
Polls the status of an asynchronous command.
Returns :pending, :running, :completed, or :failed.
@spec async_submit(ExCubecl.Command.t()) :: {:ok, command_id()} | {:error, term()}
Submits a command for asynchronous execution.
Returns a command ID that can be polled or waited on.
@spec async_wait(command_id()) :: :ok | {:error, term()}
Blocks until the given command completes.
@spec available?() :: boolean()
Checks whether a GPU device is available via ExCubecl.
@spec available_backends() :: [backend()]
Returns a list of available GPU backends on this system.
Delegates to ExCubecl availability checks and platform detection.
Returns the data type of a GPU buffer.
Reads the raw binary data from a GPU buffer.
Reads the raw binary data from a GPU buffer, raising on error.
@spec buffer_shape(buffer()) :: {:ok, [non_neg_integer()]} | {:error, term()}
Returns the shape of a GPU buffer.
@spec buffer_size(buffer()) :: {:ok, non_neg_integer()} | {:error, term()}
Returns the size of a GPU buffer in bytes.
Compiles a compute kernel for the given backend.
Kernels are managed by ExCubecl; this function verifies the kernel is available and returns a reference.
@spec destroy(context()) :: :ok
Destroys the GPU context and frees all associated resources.
Returns the capabilities of the GPU device.
@spec device_count() :: {:ok, non_neg_integer()} | {:error, term()}
Returns the number of GPU devices available.
@spec device_to_host(context(), buffer()) :: {:ok, Nx.Tensor.t()} | {:error, String.t()}
Copies data from a GPU buffer to host (CPU) as an Nx tensor.
Parameters
ctx— The GPU contextbuffer— An ExCubecl buffer reference
Returns
{:ok, Nx.Tensor.t()} on success, {:error, reason} on failure.
Executes a compute kernel on the GPU.
Parameters
ctx— The GPU contextkernel— The kernel to execute (atom)args— List of ExCubecl buffer referencesopts— Options (currently unused, reserved for future use)
Returns
{:ok, result_buffer} on success, {:error, reason} on failure.
Frees a GPU buffer.
Note: ExCubecl buffers are garbage-collected when their reference goes out of scope. This function is a no-op for API compatibility.
@spec host_to_device(context(), Nx.Tensor.t()) :: {:ok, buffer()} | {:error, String.t()}
Copies data from host (CPU) to a new GPU buffer.
Parameters
ctx— The GPU contexttensor— An Nx tensor to copy to the GPU
Returns
{:ok, buffer} on success, {:error, reason} on failure.
Initializes a GPU compute context for the given backend.
Parameters
backend— The GPU backend to use (:cuda,:metal,:vulkan,:wgpu,:rocm)opts— Options (currently unused, reserved for future use)
Returns
{:ok, context} on success, {:error, reason} on failure.
Returns the list of kernel names supported by ExCubecl.
@spec memory_total(context()) :: non_neg_integer()
Returns the total available GPU memory (in bytes).
Note: ExCubecl does not currently expose memory usage statistics. This always returns 0.
@spec memory_used(context()) :: non_neg_integer()
Returns the amount of GPU memory currently in use (in bytes).
Note: ExCubecl does not currently expose memory usage statistics. This always returns 0.
@spec pipeline() :: {:ok, pipeline_id()} | {:error, term()}
Creates a new pipeline for multi-kernel execution.
Adds a kernel command to a pipeline.
Parameters
pipeline_id— The pipeline to add tokernel— Kernel name (atom)inputs— List of input buffer referencesoutput— Output buffer referenceparams— Additional parameters (optional, default:%{})
@spec pipeline_add_struct(pipeline_id(), ExCubecl.Command.t()) :: :ok | {:error, term()}
Adds a pre-built %ExCubecl.Command{} struct to a pipeline.
@spec pipeline_free(pipeline_id()) :: :ok | {:error, term()}
Frees a pipeline and its associated resources.
@spec pipeline_run(pipeline_id()) :: {:ok, [command_id()]} | {:error, term()}
Executes all commands in the pipeline and returns their command IDs.
@spec supported_dtypes() :: [atom()]
Returns the list of supported data types.
@spec synchronize(context()) :: :ok
Synchronizes the GPU context, blocking until all queued operations complete.
Note: ExCubecl does not expose a global synchronization primitive.
Use async_wait/1 on specific command IDs for fine-grained control.
@spec version() :: String.t()
Returns the ExCubecl library version string.