# `Nx.Vulkan.Backend`
[🔗](https://github.com/borodark/nx_vulkan/blob/main/lib/nx_vulkan/backend.ex#L1)

`Nx.Backend` implementation on top of the Vulkan compute primitives.

Tensors are represented by:

    %Nx.Vulkan.Backend{ref: <ResourceArc<VulkanTensor>>, shape: tuple, type: {:f, 32}}

Storage lives on the GPU; an Elixir reference to the `ResourceArc`
keeps the GPU buffer alive. When the Elixir reference is GC'd, the
Rust `Drop` impl frees the buffer.

## Status — v0.0.3 baseline

The backend implements only the operators wired in earlier
iterations:

  - `from_binary/3`, `to_binary/2`, `backend_copy/3`, `backend_transfer/3`
  - elementwise binary: `add`, `subtract`, `multiply`, `divide`, `pow`,
    `max`, `min`
  - elementwise unary:  `exp`, `log`, `sqrt`, `abs`, `negate`, `sigmoid`,
    `tanh`, `tanh`, `relu`-via-clamp-to-0…
  - reductions:         `sum`, `reduce_max`, `reduce_min` (all-axis only)
  - linear algebra:     `dot/6` (rank-2 × rank-2 matmul)

Anything else falls back to `Nx.BinaryBackend` automatically via
`Nx.backend_transfer/2`.

## Limitations

  - **Compute is f32 only.** Storage round-trips any type
    (f32/f64/s8..s64/u8..u64) via `from_binary`/`to_binary`/`as_type`.
    Per-element ops still dispatch f32 shaders. Use `as_type` to cast
    f64 accumulators to f32 before computing.
  - **No autograd.** `defn grad` won't work end-to-end. The forward
    path is what this backend proves.
  - **Per-axis reductions are host-materialized.** Full-axis
    reductions hit the GPU; per-axis go through download/walk/upload.
  - **No broadcasting in this backend.** Broadcasting happens in
    Nx's frontend before dispatch; we receive already-broadcast
    tensors. (Will swap to the broadcast shader later.)

# `fast_inv_mass_apply`

# `fast_kinetic_energy`

# `fast_leapfrog_momentum_half`

# `fast_leapfrog_position`

# `fast_momentum_step`

# `fast_normal_logpdf`

---

*Consult [api-reference.md](api-reference.md) for complete listing*
