API Reference emlx_axon v#0.3.0

Copy Markdown View Source

Modules

Axon model rewrites that swap supported nodes to EMLX.Fast Metal shaders.

Loads Qwen3 weights from an MLX-4bit safetensors checkpoint into Bumblebee Axon params format (BF16, Bumblebee {in, out} key convention).

Post-load param quantization for Bumblebee models.

Grouped-query attention (GQA) for Qwen3, with a preallocated KV cache.

Autoregressive token generation loop.

Stateless layer primitives: RMSNorm, SwiGLU.

Loads a lmstudio-community/Qwen3-*-MLX-4bit checkpoint from disk into an %EMLXAxon.Qwen3.Model.State{} struct.

Qwen3 quantized model state struct and forward pass.

Loaded model weights and config.

Three sampling strategies for autoregressive token generation.

A Nx.Serving-compatible wrapper around the native Qwen3 quantized model.