emlx_axon provides Axon model rewrites that swap supported nodes for EMLX.Fast Metal shader implementations, accelerating inference on Apple Silicon.

It builds on top of emlx and is intended to be used alongside Bumblebee for running LLM serving workloads on MLX.

Usage

Add emlx_axon as a dependency in your mix.exs:

def deps do
  [
    {:emlx_axon, github: "elixir-nx/emlx", sparse: "emlx_axon", branch: "main"},
    {:emlx, github: "elixir-nx/emlx", branch: "main"}
  ]
end

Model download

The examples and tests that run inference require local model checkpoints downloaded from HuggingFace.

Install the HuggingFace CLI if you don't have it:

pipx install huggingface_hub[cli]

Download the model checkpoints:

# 0.6B — small, fast to iterate (~400 MB)
huggingface-cli download lmstudio-community/Qwen3-0.6B-MLX-4bit \
  --local-dir ~/models/Qwen3-0.6B-MLX-4bit

# 8B — headline size (~5 GB)
huggingface-cli download lmstudio-community/Qwen3-8B-MLX-4bit \
  --local-dir ~/models/Qwen3-8B-MLX-4bit

Export the path before running tests or benchmarks:

export EMLX_QWEN3_MODEL_PATH=~/models/Qwen3-0.6B-MLX-4bit

Pinning a model revision

For golden-token determinism, pin the model revision in HuggingFace by passing --revision <commit_sha> to huggingface-cli download.

CI note

Tests that require a local checkpoint are excluded from the default mix test run and from CI — do not add a CI job that downloads the checkpoint, as the 8B model is ~5 GB and the tests require local Apple Silicon hardware.