# `Nx.Vulkan.NativeV`
[🔗](https://github.com/borodark/nx_vulkan/blob/main/lib/nx_vulkan/native_v.ex#L1)

Rustler NIF for the pure-Rust (vulkano) compute backend.

Sibling of `Nx.Vulkan.Native` (the C++/spirit-backed NIF). Same
chain-shader dispatch contract, but resource lifetimes are
managed by Rust ownership (Arc<Buffer>) rather than opaque
`VkBuf*` pointers — so the stale-handle bug class that
surfaced as `ArgumentError` in `Nx.Vulkan.Backend.to_binary`
(Mission II R4) is structurally absent.

This module is the spike landing zone — for now it only
exposes `leapfrog_chain_synth/6`, taking bytes in and bytes
out (no persistent ResourceArc tensor handles). When the
vulkano backend gets feature-parity with the C++ path, this
expands to cover `apply_binary`, `reduce`, etc.

## Compatibility

- Loads any SPV the existing pipeline emits (verified
  byte-for-byte equivalence against `Nx.Vulkan.Native.leapfrog_chain_synth`
  on the regime-model R3 fixture; see
  `nx_vulkan/spike/vulkano_synth/README.md`).
- Builds on Linux + FreeBSD 15.0 with vulkano 0.34.

# `apply_binary`

Elementwise binary op. `op_code` selects which operation the
shader executes via a specialisation constant:

    0=add  1=mul  2=sub  3=div  4=pow  5=max  6=min

Buffers must all be the same byte size. Returns `:ok` or
`{:error, :size_mismatch}` / `{:error, :dispatch_failed, msg}`.

# `apply_unary`

Elementwise unary op. `op_code` selects:

    0=exp  1=log  2=sqrt  3=abs  4=neg  5=sigmoid  6=tanh  7=relu
    8=ceil  9=floor  10=sign  11=reciprocal  12=square

Buffers must be the same byte size.

# `buf_alloc`

Allocate a zero-initialised device buffer of `n_bytes`. Returns `{:ok, ref}`.

# `buf_byte_size`

Buffer size in bytes (returns integer, never crashes on a valid resource).

# `buf_download`

Read a device buffer back to a host binary. Returns `{:ok, binary}`.

# `buf_upload`

Allocate a device buffer + upload `data` to it. Returns
`{:ok, ref}`. The ref is a Rustler resource that owns the
underlying `Arc<Buffer>` — when the BEAM GCs it, vulkano's Drop
runs and the GPU memory is freed.

# `buf_upload_into`

Overwrite an existing device buffer with new host data.
Returns `:ok` or `{:error, :size_mismatch}` when sizes disagree.

# `leapfrog_chain_synth`

Dispatch a K-step leapfrog chain against the synthesised SPV.

All inputs are binaries:

- `q_init`, `p_init`: d * 4 bytes each (little-endian f32)
- `extras`: (n_obs + d) * 4 bytes — obs followed by inv_mass
  in the R2.2.3 packed layout
- `push`: 20–128 bytes, the synth template's push block
  (`K, n_obs, d, _pad, eps`)
- `k`: leapfrog steps per dispatch (must match push.K)
- `spv_path`: filesystem path to the cached SPV blob

Returns `{:ok, {q_chain_bin, p_chain_bin, grad_chain_bin,
logp_chain_bin}}` on success — same shape as the C++ NIF's
return after `download_binary_batch4/4`.

# `matmul`

2D matmul. C = A · B where A is M×K row-major, B is K×N row-major,
C is M×N row-major. All f32. Buffers: a (m*k*4), b (k*n*4), out
(m*n*4).

# `reduce_axis`

Per-axis reduction. `op_code`: 0=sum, 1=max, 2=min.

Input shape is interpreted as a virtual (outer, reduce_size, inner)
tensor; output shape is (outer, inner) — i.e. the reduction
collapses the middle axis. For full reductions use
`outer=1, reduce_size=n, inner=1`.

Buffers: out has `outer * inner` elements; a has
`outer * reduce_size * inner` elements.

# `transpose_2d`

2D transpose. Input A is M×N row-major; output is N×M row-major.
Buffers: a (`m*n*4` bytes), out (`m*n*4` bytes).

---

*Consult [api-reference.md](api-reference.md) for complete listing*
