ExBurn.Serving (ex_burn v0.1.0)

Nx.Serving integration for ExBurn.

Provides batched, concurrent inference using Nx.Serving so that ExBurn can be used in Bumblebee-style production pipelines.

Usage

# Define a serving for a compiled model
serving =
  ExBurn.Serving.new(model,
    batch_size: 32,
    batch_timeout: 50,
    partitions: System.schedulers_online()
  )

# Run batched inference
Nx.Serving.run(serving, input_tensor)

Options

:batch_size — Maximum number of inputs to batch together (default: 32)
:batch_timeout — Max milliseconds to wait for a full batch (default: 50)
:partitions — Number of serving partitions (default: scheduler count)
:padding — Whether to pad batches to full size (default: false)

Summary

Types

t()

Functions

build(model, opts \\ [])

Builds an Nx.Serving for the given model and options.

new(model, opts \\ [])

Creates a new ExBurn serving for the given compiled model.

run(serving, input)

Runs inference on a single input tensor using the serving.

Types

t()

@type t() :: %ExBurn.Serving{
  batch_size: pos_integer(),
  batch_timeout: pos_integer(),
  model: ExBurn.Model.t(),
  padding: boolean(),
  partitions: pos_integer()
}

Functions

build(model, opts \\ [])

@spec build(
  ExBurn.Model.t(),
  keyword()
) :: Nx.Serving.t()

Builds an Nx.Serving for the given model and options.

This is the primary entry point for production use. The returned Nx.Serving can be used with Nx.Serving.run/2 or supervised in your application tree.

Examples

serving =
  ExBurn.Serving.build(model,
    batch_size: 16,
    batch_timeout: 100
  )

# Run inference
output = Nx.Serving.run(serving, input)

new(model, opts \\ [])

@spec new(
  ExBurn.Model.t(),
  keyword()
) :: t()

Creates a new ExBurn serving for the given compiled model.

Returns a struct that can be passed to Nx.Serving or used directly with run/2.

run(serving, input)

@spec run(t(), Nx.Tensor.t()) :: Nx.Tensor.t()

Runs inference on a single input tensor using the serving.

This is a convenience wrapper around Nx.Serving.run/2.