0.1.0 (2026-05-20)
First Hex release.
Added — Nx.Vulkan.VulkanoBackend (pure-Rust path)
A new Nx.Backend implementation built on the vulkano
Rust wrapper around Vulkan compute. Sibling to the existing
Nx.Vulkan.Backend (C++ spirit-backed); they share the SPV
catalog under priv/shaders/ and the chain-shader synthesis
pipeline.
Why a second backend. A use-after-free in the C++ FFI layer
crashed the live trader three minutes after every restart —
Nx.Vulkan.Native.byte_size raising :badarg on a stale
VkBuf* pointer that had outlived its referent. Vulkano's
Arc<Buffer> ownership makes that bug class structurally
impossible: a Subbuffer<u8> cannot outlive its parent at the
Rust type level.
What it ships.
- Buffer lifecycle:
buf_upload,buf_alloc,buf_download,buf_byte_size,buf_upload_into. Each wraps a vulkanoSubbuffer<[u8]>in a Rustler resource; the BEAM GC's drop triggers vulkano'svkDestroyBuffer + vkFreeMemorychain. - Compute ops (24 native through specialised SPVs):
- Elementwise binary (f32 + f64): add, subtract, multiply, divide, pow, max, min.
- Elementwise unary (f32 + f64): exp, log, sqrt, abs, negate, sigmoid, tanh, floor, ceil, sign.
- Reductions (f32 + f64): sum, reduce_max, reduce_min; all-axes, leading-axis, trailing-axis.
- Shape / movement: reshape (zero-copy), squeeze (zero-copy), 2D transpose.
- Matmul: rank-2 × rank-2, f32 only.
- Host-fallback callbacks (correctness first; perf-native
shaders pending): slice, as_type, comparison ops (equal,
not_equal, less, less_equal, greater, greater_equal), select,
all, any, dot (non-standard axis configs),
block/4(routesNx.Block.LinAlg.SVD/QR/Cholesky/solvethroughBinaryBackend). - Pipeline cache keyed by
(spv_path, op_code). First call builds the layout + pipeline; subsequent calls reuse them. Required for long-running workloads (without it, vulkano'sStandardDescriptorSetAllocatorcreates a freshDescriptorPoolper unique layout identity, eventually exhausting driver limits on FreeBSD).
Validated workloads.
- Axon training step: Dense → sigmoid → Dense + MSE +
Nx.Defn.value_and_grad. Forward loss matchesBinaryBackendbyte-identical; gradient sum agrees to 1e-8. 100-step SGD trajectory matches at every step within 2e-6; final loss agrees to 4e-7 with both backends converging by 350×. - eXMC regime model log-posterior: 8 free RVs, softmax-mixture
custom likelihood over 200 observations. Matches
BinaryBackendto 1e-7 at f64 precision. Roughly 2× faster than the C++ path on the bench target (GT 650M, FreeBSD 15.0). - Scholar linear regression (normal equation + SVD):
coefficients match
BinaryBackendto 2e-6 on synthetic regression. SVD via host-fallbackblock/4.
Autograd. No backward callbacks were written. Nx.Defn.grad
is a graph transformation that expresses backward ops in terms
of forward ops — forward op coverage is therefore gradient
coverage when running through Nx.Defn.Evaluator. Validated
end-to-end via the Axon training step.
Added — Mission II chain-shader synthesis
Exmc.NUTS.CustomSynth-style runtime synthesis of multi-RV
HMC/NUTS chain shaders. Take a multi-RV IR with a Custom
likelihood, trace via Nx.Defn, emit GLSL, compile to SPIR-V,
content-address cache, dispatch. Validated on the regime model
(8 RVs + 200-obs softmax-mixture) on GT 650M at 60 ms per K=32
leapfrog dispatch — 8.3× under the 500 ms/sample budget.
Existing — Nx.Vulkan.Backend (C++ spirit path)
The legacy backend stays in this release. It runs the chain-shader
synthesis pipeline and the Mission II dispatch. The stale-handle
bug class that motivated the migration is still present; the
recommended path forward is VulkanoBackend for general Nx
work plus the spirit-backed chain dispatch (or vulkano's
chain-shader dispatch via Nx.Vulkan.NativeV.leapfrog_chain_synth)
for HMC.
Build notes
- Rust 1.85 pinned via
rust-toolchain.toml. See the comment in that file for the upstream rustler reason. - Vulkan SDK +
glslangValidatorrequired:- Linux:
apt install libvulkan-dev vulkan-tools glslang-tools - FreeBSD:
pkg install vulkan-loader vulkan-headers vulkan-tools glslang shaderc
- Linux:
- vulkano 0.34 builds in ~30s on Linux, ~3:18 on FreeBSD 15.0.
What's missing (the honest queue)
- Persistent buffer pool (per-call allocation works but costs a millisecond per dispatch).
- f64 matmul shader (regime model's
Nx.dotfalls back to host). - Native linalg shaders (SVD, QR, Cholesky, solve) — Scholar currently routes these through host.
- Custom
Nx.Defncompiler — today we run throughNx.Defn.Evaluatorop-by-op; whole-graph optimisation is EXLA-style work. - Convolutions, FFTs, sort, scatter — the long tail of Nx ops.
- R4 live-trader cutover — the production trader has not been
switched to
VulkanoBackendyet.
Links
- Blog: The Backend That Didn't Need to Know
- Roadmap:
docs/VULKANO_BACKEND_ROADMAP.md - 10-minute intro:
livebooks/intro_10min.livemd - Examples:
examples/axon_training_loop.exs,examples/full_bench.exs