Gleam CI License


graph LR
    A["24 GB"] -->|"×8"| B["192 GB"]

Install

gleam add viva_tensor

Use

import viva_tensor/nf4

let small = nf4.quantize(big_tensor, nf4.default_config())
// 8x less memory

Algorithms

flowchart LR
    T[Tensor] --> Q{Quantize}
    Q -->|4x| I[INT8]
    Q -->|8x| N[NF4]
    Q -->|8x| A[AWQ]
CompressionEfficiency
INT84x40%
NF47.5x77%
AWQ7.7x53%

Build

make test
make bench

Docs

docs/ — PT-BR, EN, 中文


Search Document