Mobile Deployment with ExBurn

Overview

ExBurn compiles trained models for mobile deployment via Burn's CubeCL backend. The pipeline optimizes models for the target GPU backend:

iOS: Metal via CubeCL
Android: Vulkan via CubeCL

ExBurn is designed as a library — it provides the Nx backend and GPU acceleration layer that other frameworks can build on top of.

Compiling a Model

# Define a model with Axon
model =
  Axon.input("input", shape: {nil, 784})
  |> Axon.dense(128, activation: :relu)
  |> Axon.dropout(rate: 0.2)
  |> Axon.dense(10)

# Compile for training/inference
compiled = ExBurn.Model.compile(model,
  loss: :cross_entropy,
  optimizer: :adam,
  learning_rate: 0.001
)

# Run inference
{:ok, output} = ExBurn.Model.predict(compiled, input_tensor)

# Save for deployment
ExBurn.Model.save(compiled, "model.bin")

# Load
{:ok, loaded} = ExBurn.Model.load(compiled, "model.bin")

Using ExCubecl for GPU Inference

ExBurn integrates with ExCubecl for GPU buffer management and kernel execution:

# Create GPU buffers via ExCubecl
{:ok, input_buf} = ExCubecl.buffer([1.0, 2.0, 3.0], [3], :f32)
{:ok, output_buf} = ExCubecl.buffer([0.0, 0.0, 0.0], [3], :f32)

# Run a kernel
ExCubecl.run_kernel("elementwise_add", [input_buf, input_buf], output_buf)

# Read results back
{:ok, data} = ExCubecl.read(output_buf)

Using ExBurn.Serving for Batched Inference

For production inference with concurrent batching:

# Build a serving from a compiled model
serving = ExBurn.Serving.build(compiled,
  batch_size: 32,
  batch_timeout: 50
)

# Run batched inference
output = Nx.Serving.run(serving, input_tensor)

Model Optimization Tips

Use f16 quantization: Halves memory usage with minimal accuracy loss
Reduce model size: Target < 10MB for mobile apps
Batch inference: Process multiple inputs together for better throughput
Use ExCubecl pipelines: Chain multiple GPU kernels without CPU round-trips
Profile on device: Benchmark on the target hardware before deploying

Supported Operations

Operation	iOS (Metal)	Android (Vulkan)
Dense	✅	✅
Conv2D	✅	✅
ReLU	✅	✅
Sigmoid	✅	✅
Softmax	✅	✅
Dropout	✅	✅
LayerNorm	✅	✅

← Previous Page Training Models with ExBurn

Next Page → Architecture Deep-Dive