viva_tensor/sparse

SparseTensor - 2:4 Sparsity

Use cuSPARSELt to prune and compress weight matrices.

Ideal for Large Language Model (LLM) weights.

Types

Reference to a 2:4 structured sparse tensor (GPU)

pub type SparseTensor =
  ffi.SparseTensorRef

Values

pub fn available() -> Bool

Check if cuSPARSELt is available

pub fn compression_ratio(
  tensor: ffi.SparseTensorRef,
) -> Result(Float, String)

Get actual compression ratio (DenseBytes / SparseBytes)

pub fn from_cuda16(
  tensor: ffi.CudaTensor16Ref,
) -> Result(ffi.SparseTensorRef, String)

Create SparseTensor from CudaTensor16 (Prune + Compress)

This operation is destructive: it prunes the smallest 2 values in every 4-element block. The resulting sparse tensor is stored in a compressed format on the GPU.

pub fn matmul(
  a_sparse: ffi.SparseTensorRef,
  b_dense: ffi.CudaTensor16Ref,
  m: Int,
  n: Int,
  k: Int,
) -> Result(ffi.CudaTensor16Ref, String)

Sparse Matrix Multiplication (SpMM)

C = Sparse(A) @ Dense(B)

  • a_sparse: Compressed weight matrix (2:4 sparse)
  • b_dense: Dense activation matrix (FP16)

Returns dense FP16 result.

pub fn shape(
  tensor: ffi.SparseTensorRef,
) -> Result(List(Int), String)

Get shape of the original dense tensor [Rows, Cols]

Search Document