# Getting Started with ExBurn

## What is ExBurn?

ExBurn is a middle layer between [Nx](https://github.com/elixir-nx/nx) (Numerical Elixir) and [Burn](https://github.com/tracel-ai/burn) (a Rust deep learning framework). It lets you write ML code in Elixir that runs on the GPU — on NVIDIA cards (CUDA), Apple Silicon (Metal), or Android (Vulkan).

```
Elixir code → Nx.Defn → ExBurn → Burn/CubeCL → GPU
```

## Installation

Add to `mix.exs`:

```elixir
def deps do
  [
    {:ex_burn, "~> 0.2"},
    {:nx, ">= 0.12.0"},
    {:axon, "~> 0.8"},
    {:ex_cubecl, ">= 0.4.0"}
  ]
end
```

```bash
mix deps.get
mix compile
```

### Prerequisites

| Requirement | Version | Notes |
|---|---|---|
| Elixir | ~> 1.18 | |
| OTP | 27+ | |
| Rust stable | any | Needed for NIF compilation (until v0.2.0 precompiled binaries) |
| GPU drivers | — | CUDA / Metal / Vulkan depending on platform |

For iOS: `rustup target add aarch64-apple-ios`
For Android: `rustup target add aarch64-linux-android`

## Basic Tensor Operations

```elixir
# Set ExBurn as the default backend — all Nx ops now go through Burn
Nx.default_backend(ExBurn.Backend)

# Or use the convenience function
ExBurn.configure!()

# Create tensors
a = Nx.tensor([1.0, 2.0, 3.0])
b = Nx.tensor([4.0, 5.0, 6.0])

# Element-wise operations
Nx.add(a, b)        # [5.0, 7.0, 9.0]
Nx.multiply(a, b)   # [4.0, 10.0, 18.0]

# Matrix operations
m = Nx.tensor([[1.0, 2.0], [3.0, 4.0]])
Nx.transpose(m)     # [[1.0, 3.0], [2.0, 4.0]]
```

## GPU-Accelerated Functions with `defn`

The `ExBurn.Defn.Compiler` implements the `Nx.Defn.Compiler` behaviour, letting you write GPU-accelerated numerical functions:

```elixir
# Set ExBurn as both backend and compiler
Nx.default_backend(ExBurn.Backend)
Nx.Defn.global_default_options(compiler: ExBurn.Defn.Compiler)

defmodule MyMath do
  import Nx.Defn

  defn add_and_scale(x, y, scale) do
    x |> Nx.add(y) |> Nx.multiply(scale)
  end

  defn dot_product(a, b) do
    a |> Nx.multiply(b) |> Nx.sum()
  end
end

# These execute on GPU via Burn
MyMath.add_and_scale(Nx.tensor([1.0, 2.0]), Nx.tensor([3.0, 4.0]), Nx.tensor(2.0))
#=> #Nx.Tensor<[8.0, 12.0]>
```

Per-function compiler override:

```elixir
defn my_fun(x, opts \\ []) do
  Nx.sin(x)
end
compiler: ExBurn.Defn.Compiler
```

## Checking GPU Availability

```elixir
# Quick check
ExBurn.default_device()    # :gpu or :cpu
ExBurn.device_name()       # e.g. "CUDA (NVIDIA RTX 4090)" or "Metal (Apple M4)"
ExBurn.device_info()       # full map with :device, :gpu_available, :backend, :available_backends
ExBurn.cuda_available?()   # true if NVIDIA GPU detected
```

## Using BurnBridge Directly

For performance-critical code, bypass the Nx layer and talk to Burn directly:

```elixir
# Create Burn tensors directly
t1 = ExBurn.BurnBridge.zeros([3, 3], :f32)
t2 = ExBurn.BurnBridge.ones([3, 3], :f32)

# Perform operations (single NIF call each)
t3 = ExBurn.BurnBridge.add(t1, t2)
t4 = ExBurn.BurnBridge.matmul(t1, t2)
t5 = ExBurn.BurnBridge.relu(t3)

# Convert back to Nx when needed
nx_tensor = ExBurn.BurnBridge.to_nx(t3)
```

## Using ExCubecl for GPU Buffers

ExCubecl provides low-level GPU buffer management:

```elixir
# Create GPU-resident buffers
{:ok, a} = ExCubecl.buffer([1.0, 2.0, 3.0], [3], :f32)
{:ok, b} = ExCubecl.buffer([4.0, 5.0, 6.0], [3], :f32)

# Inspect
{:ok, [3]} = ExCubecl.shape(a)
{:ok, 12} = ExCubecl.size(a)  # bytes

# Read data back
{:ok, data} = ExCubecl.read(a)

# Buffers are automatically freed when GC'd
```

## Project Structure

```
lib/ex_burn/
  ex_burn.ex            — Main API (version, configure!, device_info)
  defn_compiler.ex      — Nx.Defn.Compiler for GPU-accelerated defn
  backend.ex            — Nx.Backend implementation (delegates to Burn via NIF)
  nif.ex                — Rustler NIF stubs (40+ functions)
  nif_helper.ex         — Safe NIF wrappers ({:ok, result} tuples)
  tensor.ex             — Nx ↔ Burn tensor conversion utilities
  error.ex              — Structured error type (ExBurn.Error)
  burn_bridge.ex        — High-level Burn API (direct tensor ops)
  cubecl_bridge.ex      — GPU compute via ExCubecl (buffers, kernels, pipelines)
  model.ex              — Model definition, compilation, save/load
  training.ex           — Training loop (optimizers, LR schedules, callbacks)
  serving.ex            — Nx.Serving integration for batched inference
  serving/server.ex     — Serving server implementation

native/ex_burn_nif/
  src/lib.rs            — Rust NIF with real Burn Autodiff<CubeCL> operations
  Cargo.toml            — Burn + CubeCL + Autodiff dependencies
```

## Next Steps

- [Training Models](02_training.md) — Define, compile, and train neural networks
- [Mobile Deployment](03_mobile_deployment.md) — iOS/Android compilation and optimization
- [Architecture Deep-Dive](04_architecture.md) — How the pipeline works internally
- [Training Optimization Guide](05_training_optimization.md) — Best practices for fast, stable training