Modules
Loads Qwen3 weights from an MLX-4bit safetensors checkpoint into Bumblebee
Axon params format (BF16, Bumblebee {in, out} key convention).
Post-load param quantization for Bumblebee models.
Grouped-query attention (GQA) for Qwen3, with a preallocated KV cache.
Autoregressive token generation loop.
Stateless layer primitives: RMSNorm, SwiGLU.
Loads a lmstudio-community/Qwen3-*-MLX-4bit checkpoint from disk into
an %EMLXAxon.Qwen3.Model.State{} struct.
Qwen3 quantized model state struct and forward pass.
Loaded model weights and config.
Three sampling strategies for autoregressive token generation.
A Nx.Serving-compatible wrapper around the native Qwen3 quantized model.