# `ExBurn.Serving`
[🔗](https://github.com/ohhi-vn/ex_burn/blob/main/lib/ex_burn/serving.ex#L1)

Nx.Serving integration for ExBurn.

Provides batched, concurrent inference using `Nx.Serving` so that ExBurn
can be used in Bumblebee-style production pipelines.

## Usage

    # Define a serving for a compiled model
    serving =
      ExBurn.Serving.new(model,
        batch_size: 32,
        batch_timeout: 50,
        partitions: System.schedulers_online()
      )

    # Run batched inference
    Nx.Serving.run(serving, input_tensor)

## Options

  * `:batch_size` — Maximum number of inputs to batch together (default: 32)
  * `:batch_timeout` — Max milliseconds to wait for a full batch (default: 50)
  * `:partitions` — Number of serving partitions (default: scheduler count)
  * `:padding` — Whether to pad batches to full size (default: false)

# `t`

```elixir
@type t() :: %ExBurn.Serving{
  batch_size: pos_integer(),
  batch_timeout: pos_integer(),
  model: ExBurn.Model.t(),
  padding: boolean(),
  partitions: pos_integer()
}
```

# `build`

```elixir
@spec build(
  ExBurn.Model.t(),
  keyword()
) :: Nx.Serving.t()
```

Builds an `Nx.Serving` for the given model and options.

This is the primary entry point for production use. The returned
`Nx.Serving` can be used with `Nx.Serving.run/2` or supervised
in your application tree.

## Examples

    serving =
      ExBurn.Serving.build(model,
        batch_size: 16,
        batch_timeout: 100
      )

    # Run inference
    output = Nx.Serving.run(serving, input)

# `new`

```elixir
@spec new(
  ExBurn.Model.t(),
  keyword()
) :: t()
```

Creates a new ExBurn serving for the given compiled model.

Returns a struct that can be passed to `Nx.Serving` or used directly
with `run/2`.

# `run`

```elixir
@spec run(t(), Nx.Tensor.t()) :: Nx.Tensor.t()
```

Runs inference on a single input tensor using the serving.

This is a convenience wrapper around `Nx.Serving.run/2`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
