View Source YOLO.Models.Ultralytics (YOLO v0.1.2)

Ultralytics model implementation for preprocessing input images and postprocessing detections using non-maximum suppression (NMS).

Supports YOLOv8 and YOLOv11 models trained on the COCO dataset (80 classes).

Summary

Functions

postprocess(model, model_output_nx, scaling_config, opts)

Post-processes the model's raw output to produce a filtered list of detected objects.

preprocess(model, image, options)

Preprocesses an input image to match the model's required format.

Functions

postprocess(model, model_output_nx, scaling_config, opts)

@spec postprocess(
  model :: YOLO.Model.t(),
  model_output :: Nx.Tensor.t(),
  scaling_config :: YOLO.FrameScalers.ScalingConfig.t(),
  opts :: Keyword.t()
) :: [[float()]]

Post-processes the model's raw output to produce a filtered list of detected objects.

The raw output tensor has shape {1, 84, 8400} where:

First dimension (1) is the batch size
Second dimension (84) contains 4 bbox coordinates + 80 class probabilities per detection
Third dimension (8400) is the number of candidate detections

The processing steps are:

Reshapes and transposes output to {8400, 84} format
Applies non-maximum suppression (NMS) to filter overlapping detections
Scales bounding boxes back to original image dimensions

Arguments

model - YOLO.Model struct containing model metadata
model_output - Raw output tensor from model inference {1, 84, 8400}
scaling_config - Scaling configuration from preprocessing step
opts - Keyword list of options:
- prob_threshold - Minimum probability threshold for detections
- iou_threshold - IoU threshold for non-maximum suppression
- nms_fun - Optional custom NMS function (defaults to YOLO.NMS.run/3)

Returns

List of detections where each detection is a list [cx, cy, w, h, prob, class_idx]:

cx, cy - Center coordinates of bounding box
w, h - Width and height of bounding box
prob - Detection probability
class_idx - Class index (0-79 corresponding to COCO classes)

preprocess(model, image, options)

@spec preprocess(YOLO.Model.t(), term(), Keyword.t()) ::
  {Nx.Tensor.t(), YOLO.FrameScalers.ScalingConfig}

Preprocesses an input image to match the model's required format.

The preprocessing steps are:

Scales and pads the image to match model input dimensions while preserving aspect ratio
Converts RGB to BGR color format (Ultralytics models expect BGR input)
Normalizes pixel values to [0,1] range by dividing by 255
Transposes dimensions to match model's expected format
Adds batch dimension

Arguments

model - YOLO.Model struct containing model metadata
image - Input image in implementation's native format
options - Keyword list of options including :frame_scaler module

Returns

{input_tensor, scaling_config} tuple where:
- input_tensor has shape {1, 3, height, width}
- scaling_config contains scaling/padding info for postprocessing