Nx.Vulkan.Native (nx_vulkan v0.1.0)

Copy Markdown View Source

Rustler NIF bindings for the Vulkan compute backend.

All functions in this module are NIF stubs that fail with :nif_not_loaded if the native library wasn't compiled. They get replaced at module-load time by the real Rust implementations.

Don't call these directly from application code — use Nx.Vulkan or the Nx.Vulkan.Backend module instead. This module exists only to give Rustler a place to bind into.

Summary

Functions

Phase 2 W5 — read the device's pipelineCacheUUID as a 16-byte binary.

Batched download of 4 GPU tensors in a single submit/wait round-trip.

Generic K-step leapfrog chain for synthesized shaders.

Phase 2 W5 — load on-disk pipeline cache blob into the spirit context.

Phase 2 W5 — atomically write spirit's current pipeline cache to disk.

H3 dispatch timing — read {count, dispatch_ns, submit_ns, wait_ns, record_ns}.

H3 dispatch timing — reset accumulators.

Upload a binary into an existing GPU buffer (no alloc).

Batched upload of 2 binaries into 2 existing GPU buffers in one round-trip.

Functions

device_uuid()

Phase 2 W5 — read the device's pipelineCacheUUID as a 16-byte binary.

download_binary_batch4(t1, t2, t3, t4)

Batched download of 4 GPU tensors in a single submit/wait round-trip.

leapfrog_chain_synth(q, p, inv_mass, push, k, spv_path)

Generic K-step leapfrog chain for synthesized shaders.

push is a raw binary assembled by the Elixir-side codegen (max 128 bytes). Returns {:ok, {q_chain, p_chain, grad_chain, logp_chain}}.

pipeline_cache_load(path)

Phase 2 W5 — load on-disk pipeline cache blob into the spirit context.

pipeline_cache_persist(path)

Phase 2 W5 — atomically write spirit's current pipeline cache to disk.

timing_get()

H3 dispatch timing — read {count, dispatch_ns, submit_ns, wait_ns, record_ns}.

timing_reset()

H3 dispatch timing — reset accumulators.

upload_binary_into(tensor, data)

Upload a binary into an existing GPU buffer (no alloc).

upload_binary_into_batch2(t1, d1, t2, d2)

Batched upload of 2 binaries into 2 existing GPU buffers in one round-trip.