Image.Detection (image_vision v0.2.0)

Object detection — where are the objects in this image?

Pass a Vix.Vips.Image.t/0 to detect/2 and get back a list of detected objects with their class labels, confidence scores, and bounding boxes.

Quick start

iex> car = Image.open!("./test/support/images/lamborghini-forsennato-concept.jpg")
iex> [%{label: _, score: _, box: _} | _] = Image.Detection.detect(car)

Default model

The default is RT-DETR — a real-time, transformer-based detector that beats YOLOv8 on COCO and is Apache 2.0 licensed (unlike YOLOv8/11 which are AGPL). The ONNX export is hosted at onnx-community/rtdetr_r50vd and is downloaded on first call via ImageVision.ModelCache.

Model: onnx-community/rtdetr_r50vd / onnx/model.onnx (~175 MB).
Classes: 80 standard COCO classes (person, bicycle, car, …).
Output: per-query class scores (sigmoid) and cxcywh bounding boxes. RT-DETR is NMS-free by design — no Non-Maximum Suppression post-processing is required.

Drawing detections

Use draw_bbox_with_labels/2 to overlay detections on the original image:

image
|> Image.Detection.detect()
|> Image.Detection.draw_bbox_with_labels(image)

Optional dependency

This module is only available when Ortex is configured in your application's mix.exs.

Summary

Types

detection()

A single detected object.

Functions

classes()

Returns the list of class labels the default model can detect.

detect(image, options \\ [])

Detects objects in an image and returns a list of detections sorted by descending confidence.

draw_bbox_with_labels(detections, image, options \\ [])

Draws bounding boxes with class labels onto an image.

Types

detection()

@type detection() :: %{
  label: String.t(),
  score: float(),
  box: {non_neg_integer(), non_neg_integer(), pos_integer(), pos_integer()}
}

A single detected object.

:label is one of the 80 COCO class names, e.g. "person".
:score is the confidence score, a float in [0.0, 1.0].
:box is {x, y, width, height} in pixel coordinates of the original image. (x, y) is the top-left corner.

Functions

classes()

@spec classes() :: [String.t()]

Returns the list of class labels the default model can detect.

Returns

A list of 80 COCO class names as binaries, in the order used by RT-DETR.

detect(image, options \\ [])

@spec detect(image :: Vix.Vips.Image.t(), options :: Keyword.t()) :: [detection()]

Detects objects in an image and returns a list of detections sorted by descending confidence.

Arguments

image is any Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

:min_score is the minimum confidence score, a float in [0.0, 1.0], that a detection must meet to be returned. The default is 0.5.
:repo is the HuggingFace repository for the model. The default is "onnx-community/rtdetr_r50vd".
:filename is the ONNX file path within the repository. The default is "onnx/model.onnx". Use "onnx/model_quantized.onnx" (~45 MB INT8) for a much smaller model with some accuracy loss.

Returns

A list of detection/0 maps, sorted by descending :score.

Examples

iex> car = Image.open!("./test/support/images/lamborghini-forsennato-concept.jpg")
iex> [%{label: _, score: _, box: _} | _] =
...>   Image.Detection.detect(car, min_score: 0.5)

draw_bbox_with_labels(detections, image, options \\ [])

@spec draw_bbox_with_labels([detection()], Vix.Vips.Image.t(), Keyword.t()) ::
  Vix.Vips.Image.t()

Draws bounding boxes with class labels onto an image.

Builds an SVG overlay — one box and label per detection — and composites it onto the image. Each distinct class label gets a consistent colour so multiple detections of the same class are easy to identify at a glance.

Arguments

detections is the list returned from detect/2.
image is the image upon which detection was run.
options is a keyword list of options.

Options

:opacity is the opacity of the label background, a float in [0.0, 1.0]. The default is 0.85. Use 1.0 for fully opaque label backgrounds.
:stroke_width is the bounding box stroke width in pixels. The default is 2.
:font_size is the label text size in pixels. The default is 13.
:palette is a list of CSS colour strings used to assign colours to labels. Cycles if there are more labels than colours. The default is a 10-colour high-contrast palette.

Returns

The annotated Vix.Vips.Image.t/0.

Examples

iex> car = Image.open!("./test/support/images/lamborghini-forsennato-concept.jpg")
iex> annotated =
...>   car
...>   |> Image.Detection.detect()
...>   |> Image.Detection.draw_bbox_with_labels(car)
iex> match?(%Vix.Vips.Image{}, annotated)
true