View Source YOLO.Model behaviour (YOLO v0.2.0)

Defines a behaviour for implementing YOLO object detection models.

This module provides the structure for loading and running YOLO models for object detection. The default implementation is YOLO.Models.YoloV8, but you can create custom implementations for other YOLO variants.

Required Callbacks

To implement this behaviour, you need to define these functions:

  • preprocess/3: Prepares an input image for the model

    • Takes a model struct, input image, and options
    • Returns {preprocessed_tensor, scaling_config}
    • See YOLO.Models.YoloV8 for an example implementation
  • postprocess/4: Processes the model's raw output into detected objects

    • Takes model struct, model output tensor, scaling config, and options
    • Returns list of detected objects as [cx, cy, w, h, prob, class_idx]
    • Handles tasks like non-maximum suppression and coordinate scaling

Optional Callbacks

  • init/1: Initializes the model. This is called when the model is loaded. It's the place where you can, for example, generate and store in model_data things like grids and expanded strides that will be used later in other callbacks.

Types

  • t(): The model struct containing:

    • :ref - Reference to loaded ONNX model
    • :model_impl - Module implementing this behaviour
    • :shapes - Input/output tensor shapes
    • :classes - Map of class indices to labels
    • :model_data - Model-specific data, e.g. grids and expanded strides for YOLOX
  • detected_object(): Map containing detection results:

    • :bbox - Bounding box coordinates (cx, cy, w, h)
    • :class - Detected class name
    • :class_idx - Class index
    • :prob - Detection probability

Summary

Callbacks

Initialize the model. This is called when the model is loaded. It's the place where you can, for example, generate and store in model_data things like grids and expanded strides that will be used later in other callbacks.

Post-processes the model's raw output to produce a list of detected objects.

Prepares input image tensors for the model.

Types

classes()

@type classes() :: %{required(integer()) => String.t()}

detected_object()

@type detected_object() :: %{
  bbox: %{cx: integer(), cy: integer(), w: integer(), h: integer()},
  class: String.t(),
  class_idx: integer(),
  prob: float()
}

shape()

@type shape() :: {integer(), integer()}

t()

@type t() :: %YOLO.Model{
  classes: classes() | nil,
  model_data: term() | nil,
  model_impl: module(),
  ref: term(),
  shapes: %{required(:input | :output) => tuple()}
}

Callbacks

init(model, options)

@callback init(model :: t(), options :: Keyword.t()) :: t()

Initialize the model. This is called when the model is loaded. It's the place where you can, for example, generate and store in model_data things like grids and expanded strides that will be used later in other callbacks.

Parameters

  • model
  • options : options passed to YOLO.load/2

Returns

postprocess(model, model_output, scaling_config, options)

@callback postprocess(
  model :: t(),
  model_output :: Nx.Tensor.t(),
  scaling_config :: YOLO.FrameScalers.ScalingConfig.t(),
  options :: Keyword.t()
) :: [[float()]]

Post-processes the model's raw output to produce a list of detected objects.

The raw output from the model is a tensor containing bounding box coordinates and class probabilities for each candidate detection.

For example, YOLOv8 outputs a {1, 84, 8400} tensor where:

  • 84 represents 4 bbox coordinates + 80 class probabilities
  • 8400 represents the number of candidate detections

Returns a list of detections where each detection is a list of 6 elements:

[cx, cy, w, h, prob, class_idx]

where:

  • cx, cy: center x,y coordinates of bounding box
  • w, h: width and height of bounding box
  • prob: detection probability
  • class_idx: class index

The implementation should:

  1. Filter low probability detections
  2. Apply non-maximum suppression (NMS) to remove overlapping boxes
  3. Scale back the coordinates using the scaling_config and YOLO.FrameScaler module, since the detections are based on the model's input resolution rather than the original image size

See YOLO.Models.YoloV8.postprocess/4 for a reference implementation.

preprocess(model, image, options)

@callback preprocess(model :: t(), image :: term(), options :: Keyword.t()) ::
  {Nx.Tensor.t(), YOLO.FrameScalers.ScalingConfig.t()}

Prepares input image tensors for the model.

Parameters

  • model - The YOLO.Model struct containing model information
  • image - Input image in implementation's native format (e.g. Evision.Mat)
  • options - Keyword list of options:
    • :frame_scaler - Module implementing YOLO.FrameScaler behaviour (required)

Returns

  • {input_tensor, scaling_config} tuple where:
    • input_tensor is the preprocessed Nx tensor ready for model input, where shape is {1, channels, height, width}
    • scaling_config contains scaling/padding info for postprocessing

Look at the YOLO.Models.YoloV8.preprocess/3 implementation to see how this callback is implemented.