# `Dala.Gpu.Compute.Pipeline`
[🔗](https://github.com/manhvu/dala/blob/main/lib/dala/gpu/compute/pipeline.ex#L1)

Multi-stage GPU compute pipeline orchestration.

Pipelines chain multiple GPU operations (kernels, buffer copies, etc.)
into a single executable graph via EXCubeCL's native pipeline API.

## Example: Image processing pipeline

    pipeline = Dala.Gpu.Compute.pipeline()
    pipeline
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :blur,
      inputs: [input_buf],
      output: blurred_buf,
      params: %{radius: 3, sigma: 1.5}
    })
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :sharpen,
      inputs: [blurred_buf],
      output: sharpened_buf,
      params: %{amount: 0.5}
    })
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :grayscale,
      inputs: [sharpened_buf],
      output: output_buf,
      params: %{}
    })
    Dala.Gpu.Compute.pipeline_run(pipeline)

## Example: AI inference pipeline

    pipeline = Dala.Gpu.Compute.pipeline()
    pipeline
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :preprocess,
      inputs: [camera_buf],
      output: preprocessed_buf,
      params: %{normalize: true, size: {224, 224}}
    })
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :mobilenet_v2,
      inputs: [preprocessed_buf],
      output: logits_buf,
      params: %{}
    })
    |> Dala.Gpu.Compute.pipeline_add(%{
      op: :run_kernel,
      kernel: :softmax,
      inputs: [logits_buf],
      output: probs_buf,
      params: %{}
    })
    Dala.Gpu.Compute.pipeline_run(pipeline)

## Stage Specs

Each stage is a map with:

- `:op` — operation type (`:run_kernel`, `:copy_buffer`, `:barrier`)
- `:kernel` — kernel atom (for `:run_kernel` ops)
- `:inputs` — list of input `Buffer` structs
- `:output` — output `Buffer` struct
- `:params` — map of kernel-specific parameters

## EXCubeCL Backend

Under the hood, stages are compiled into an EXCubeCL native pipeline
(`ExCubecl.pipeline/0` + `ExCubecl.pipeline_add/5` + `ExCubecl.pipeline_run/1`)
for efficient batch submission. The pipeline is freed after execution.

# `stage`

```elixir
@type stage() :: %{
  op: atom(),
  kernel: atom() | nil,
  inputs: [reference()],
  output: reference(),
  params: map()
}
```

# `t`

```elixir
@type t() :: %Dala.Gpu.Compute.Pipeline{ref: reference() | nil, stages: [stage()]}
```

# `add`

```elixir
@spec add(t(), map()) :: t()
```

Add a stage to a pipeline. Returns the pipeline for chaining.

# `new`

```elixir
@spec new() :: t()
```

Create a new empty pipeline.

# `run`

```elixir
@spec run(t()) :: :ok | {:error, term()}
```

Execute all stages in the pipeline via EXCubeCL native pipeline API.

# `stage_count`

```elixir
@spec stage_count(t()) :: non_neg_integer()
```

Return the number of stages in the pipeline.

# `stages`

```elixir
@spec stages(t()) :: [stage()]
```

Return the list of stages in the pipeline.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
