# emlx_axon v0.3.0 - API Reference

## Modules

- [EMLXAxon](EMLXAxon.md): Axon model rewrites that swap supported nodes to `EMLX.Fast` Metal shaders.
- [EMLXAxon.MLX4BitParams](EMLXAxon.MLX4BitParams.md): Loads Qwen3 weights from an MLX-4bit safetensors checkpoint into Bumblebee
Axon params format (BF16, Bumblebee `{in, out}` key convention).
- [EMLXAxon.QuantizeParams](EMLXAxon.QuantizeParams.md): Post-load param quantization for Bumblebee models.
- [EMLXAxon.Qwen3.Attention](EMLXAxon.Qwen3.Attention.md): Grouped-query attention (GQA) for Qwen3, with a preallocated KV cache.
- [EMLXAxon.Qwen3.Generate](EMLXAxon.Qwen3.Generate.md): Autoregressive token generation loop.
- [EMLXAxon.Qwen3.Layers](EMLXAxon.Qwen3.Layers.md): Stateless layer primitives: RMSNorm, SwiGLU.
- [EMLXAxon.Qwen3.Loader](EMLXAxon.Qwen3.Loader.md): Loads a `lmstudio-community/Qwen3-*-MLX-4bit` checkpoint from disk into
an `%EMLXAxon.Qwen3.Model.State{}` struct.
- [EMLXAxon.Qwen3.Model](EMLXAxon.Qwen3.Model.md): Qwen3 quantized model state struct and forward pass.
- [EMLXAxon.Qwen3.Model.State](EMLXAxon.Qwen3.Model.State.md): Loaded model weights and config.
- [EMLXAxon.Qwen3.Sampler](EMLXAxon.Qwen3.Sampler.md): Three sampling strategies for autoregressive token generation.
- [EMLXAxon.TextGeneration](EMLXAxon.TextGeneration.md): A `Nx.Serving`-compatible wrapper around the native Qwen3 quantized model.

