Nx.Vulkan.Backend (nx_vulkan v0.1.0)

Nx.Backend implementation on top of the Vulkan compute primitives.

Tensors are represented by:

%Nx.Vulkan.Backend{ref: <ResourceArc<VulkanTensor>>, shape: tuple, type: {:f, 32}}

Storage lives on the GPU; an Elixir reference to the ResourceArc keeps the GPU buffer alive. When the Elixir reference is GC'd, the Rust Drop impl frees the buffer.

Status — v0.0.3 baseline

The backend implements only the operators wired in earlier iterations:

from_binary/3, to_binary/2, backend_copy/3, backend_transfer/3
elementwise binary: add, subtract, multiply, divide, pow, max, min
elementwise unary: exp, log, sqrt, abs, negate, sigmoid, tanh, tanh, relu-via-clamp-to-0…
reductions: sum, reduce_max, reduce_min (all-axis only)
linear algebra: dot/6 (rank-2 × rank-2 matmul)

Anything else falls back to Nx.BinaryBackend automatically via Nx.backend_transfer/2.

Limitations

Compute is f32 only. Storage round-trips any type (f32/f64/s8..s64/u8..u64) via from_binary/to_binary/as_type. Per-element ops still dispatch f32 shaders. Use as_type to cast f64 accumulators to f32 before computing.
No autograd. defn grad won't work end-to-end. The forward path is what this backend proves.
Per-axis reductions are host-materialized. Full-axis reductions hit the GPU; per-axis go through download/walk/upload.
No broadcasting in this backend. Broadcasting happens in Nx's frontend before dispatch; we receive already-broadcast tensors. (Will swap to the broadcast shader later.)

Summary

Functions

fast_inv_mass_apply(out, p, inv_mass, opts)

fast_kinetic_energy(out, p, inv_mass, opts)

fast_leapfrog_momentum_half(out, p, half_eps, grad, opts)

fast_leapfrog_position(out, q, eps, p, opts)

fast_momentum_step(out, p, eps, grad, opts)

fast_normal_logpdf(out, x, mu, sigma, opts)