Object Detection

Image.Detection answers "where are the objects in this image and what are they?". It returns a list of bounding boxes with class labels and confidence scores.

Basic detection

iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.97, box: {120, 45, 60, 180}}

Each detection is a map with:

:label — one of 80 COCO class names (person, bicycle, car, dog, …)
:score — confidence score in [0.0, 1.0]
:box — {x, y, width, height} in pixel coordinates of the original image

Results are sorted by descending confidence.

Filtering by confidence

The default minimum score is 0.5. Raise it to get only high-confidence detections:

iex> Image.Detection.detect(street, min_score: 0.8)
[%{label: "person", score: 0.94, box: {120, 45, 60, 180}}, ...]

Drawing bounding boxes

draw_bbox_with_labels/2 annotates the image:

iex> detections = Image.Detection.detect(street)
iex> annotated = Image.Detection.draw_bbox_with_labels(detections, street)
iex> Image.save!(annotated, "annotated.jpg")

It accepts the same image that detect/2 was called on. Pipeline:

iex> street
...> |> Image.Detection.detect()
...> |> Image.Detection.draw_bbox_with_labels(street)
...> |> Image.save!("annotated.jpg")

Available classes

The default RT-DETR model is trained on COCO 80 classes:

iex> Image.Detection.classes()
["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", ...]

Using a different model

detect/2 accepts :repo and :filename to swap in any RT-DETR-family ONNX model from HuggingFace:

# Smaller R18 backbone (~80 MB) — faster, slightly less accurate
iex> Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")

# Quantized variant of the default (~45 MB INT8) — much smaller download, some accuracy cost
iex> Image.Detection.detect(image, filename: "onnx/model_quantized.onnx")

For one-off use, pass options per call. To make a non-default model the project default, you can wrap the call:

defp detect(image), do: Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")

To pre-download a model into the cache:

mix image_vision.download_models --detect

(The download task fetches the configured default; if you've changed the repo for a single call site, the cache will populate on first use.)

Caveat: COCO 80 labels are hardcoded

detect/2 maps class indices to label strings using a baked-in COCO 80 list (detection.ex). RT-DETR models trained on a different label set (e.g. Open Images, custom domains) will produce indices the wrapper can't translate — labels will be wrong even though boxes and scores are correct. For non-COCO models, use the underlying Ortex.run/2 directly with the model's own id2label.

Default model

RT-DETR (onnx-community/rtdetr_r50vd) is a real-time transformer-based detector that outperforms YOLOv8 on COCO while being Apache 2.0 licensed (YOLOv8/11 are AGPL). It is NMS-free — no Non-Maximum Suppression post-processing is needed.

Model weights are downloaded on first call and cached. Configure the cache directory with:

config :image_vision, :cache_dir, "/path/to/cache"

Dependencies

Detection requires :ortex. Add to mix.exs:

{:ortex, "~> 0.1"}

← Previous Page Image Segmentation

Next Page → Face Detection