viva_tensor/cuda

CudaTensor - Persistent GPU Memory

Tensors that live on the GPU. Ideal for weights and heavy compute.

Data is uploaded once and stays on device. Operations are launched asynchronously (mostly).

Types

Reference to a tensor stored in GPU memory (FP32)

pub type CudaTensor =
  ffi.CudaTensorRef

Reference to a tensor stored in GPU memory (FP16)

pub type CudaTensor16 =
  ffi.CudaTensor16Ref

Values

pub fn fp16_available() -> Bool

Check if FP16 Tensor Cores are available

pub fn matmul(
  a: ffi.CudaTensorRef,
  b: ffi.CudaTensorRef,
  m: Int,
  n: Int,
  k: Int,
) -> Result(ffi.CudaTensorRef, String)

Matrix Multiplication (FP32) C = A @ B

pub fn matmul16(
  a: ffi.CudaTensor16Ref,
  b: ffi.CudaTensor16Ref,
  m: Int,
  n: Int,
  k: Int,
) -> Result(ffi.CudaTensor16Ref, String)

Matrix Multiplication (FP16 Tensor Cores) C = A @ B

Uses HMMA (Half-precision Matrix Multiply Accumulate) instructions. Expect massive speedups (up to 330 TFLOPS) if dimensions align with 16x16.

pub fn new(
  data: List(Float),
  shape: List(Int),
) -> Result(ffi.CudaTensorRef, String)

Upload data to GPU (FP32)

pub fn new16(
  data: List(Float),
  shape: List(Int),
) -> Result(ffi.CudaTensor16Ref, String)

Upload data to GPU (converts f64 -> f16)

pub fn shape(
  tensor: ffi.CudaTensorRef,
) -> Result(List(Int), String)

Get shape of tensor

pub fn shape16(
  tensor: ffi.CudaTensor16Ref,
) -> Result(List(Int), String)

Get shape of FP16 tensor

pub fn to_list(
  tensor: ffi.CudaTensorRef,
) -> Result(List(Float), String)

Download data from GPU (FP32)

pub fn to_list16(
  tensor: ffi.CudaTensor16Ref,
) -> Result(List(Float), String)

Download data from GPU (converts f16 -> f64)

Search Document