Image.Segmentation (image_vision v0.2.0)

Image segmentation — which pixels belong to which object?

Two complementary entry points cover different use cases:

segment/2 — promptable segmentation via SAM 2. Click a point or draw a box and get back a precise mask for that object. Great for "cut out this foreground" or "mask this product".
segment_panoptic/2 — class-labeled segmentation via DETR-panoptic. Every region in the image gets a class label (person, car, sky, road…). Great for "what's in this image and where?"

Quick start

# Mask the object at the centre of the image
iex> image = Image.open!("photo.jpg")
iex> %{mask: mask, score: _} = Image.Segmentation.segment(image)

# Mask the object at a specific point
iex> %{mask: mask} = Image.Segmentation.segment(image, prompt: {:point, 320, 240})

# Class-label every region
iex> segments = Image.Segmentation.segment_panoptic(image)
iex> Enum.map(segments, & &1.label)
["person", "car", "road", "sky"]

Composing results with an image

# Make the masked object the only visible content (alpha mask)
iex> cutout = Image.Segmentation.apply_mask(image, mask)

# Colour-coded overlay of all segments
iex> overlay = Image.Segmentation.compose_overlay(image, segments)

Default models

Promptable — SharpAI/sam2-hiera-tiny-onnx (SAM 2 Tiny, Apache 2.0, encoder ~128 MB + decoder ~20 MB). Downloaded on first call via ImageVision.ModelCache.
Class-labeled — Xenova/detr-resnet-50-panoptic (DETR ResNet-50, Apache 2.0, ~172 MB). 250 COCO panoptic classes covering everyday things and stuff.

Both can be overridden via options — see segment/2 and segment_panoptic/2 for details.

Optional dependency

This module is only available when Ortex is configured in your application's mix.exs.

Summary

Types

mask_result()

A mask returned by segment/2.

segment()

A segmented region returned by segment_panoptic/2.

Functions

apply_mask(image, mask)

Applies a mask as the alpha channel of an image.

compose_overlay(image, segments, options \\ [])

Overlays colour-coded segment masks on an image.

segment(image, options \\ [])

Segments an object in an image using SAM 2.

segment_panoptic(image, options \\ [])

Segments and labels every region in an image using DETR-panoptic.

Types

mask_result()

@type mask_result() :: %{score: float(), mask: Vix.Vips.Image.t()}

A mask returned by segment/2.

:score — SAM IoU prediction score.
:mask — single-band Vix.Vips.Image.t/0 in original image dimensions; white pixels are the segmented object.

segment()

@type segment() :: %{label: String.t(), score: float(), mask: Vix.Vips.Image.t()}

A segmented region returned by segment_panoptic/2.

:label — COCO panoptic class name, e.g. "person" or "road".
:score — confidence score in [0.0, 1.0].
:mask — single-band Vix.Vips.Image.t/0; white (255) pixels belong to this segment, black (0) do not.

Functions

apply_mask(image, mask)

@spec apply_mask(Vix.Vips.Image.t(), Vix.Vips.Image.t()) ::
  {:ok, Vix.Vips.Image.t()} | {:error, Image.error()}

Applies a mask as the alpha channel of an image.

White pixels in the mask become fully opaque; black pixels become fully transparent. The result is an RGBA image suitable for compositing or exporting with transparency.

Arguments

image is any Vix.Vips.Image.t/0.
mask is a single-band Vix.Vips.Image.t/0 of the same dimensions, such as the :mask field of mask_result/0 or segment/0.

Returns

{:ok, image} — an RGBA Vix.Vips.Image.t/0, or
{:error, reason}.

Examples

iex> image = Image.open!("./test/support/images/puppy.webp")
iex> %{mask: mask} = Image.Segmentation.segment(image)
iex> {:ok, cutout} = Image.Segmentation.apply_mask(image, mask)
iex> Image.bands(cutout)
4

compose_overlay(image, segments, options \\ [])

@spec compose_overlay(Vix.Vips.Image.t(), [segment() | mask_result()], Keyword.t()) ::
  Vix.Vips.Image.t()

Overlays colour-coded segment masks on an image.

Each segment gets a distinct colour. Useful for visualising the output of segment_panoptic/2.

Arguments

image is any Vix.Vips.Image.t/0.
segments is the list returned by segment_panoptic/2, or any list of maps with :mask and :label keys.
options is a keyword list of options.

Options

:alpha — opacity of the overlay as a float in [0.0, 1.0]. The default is 0.5.

Returns

The annotated Vix.Vips.Image.t/0.

Examples

iex> image = Image.open!("./test/support/images/puppy.webp")
iex> segments = Image.Segmentation.segment_panoptic(image)
iex> overlay = Image.Segmentation.compose_overlay(image, segments)
iex> match?(%Vix.Vips.Image{}, overlay)
true

segment(image, options \\ [])

@spec segment(Vix.Vips.Image.t(), Keyword.t()) :: mask_result() | [mask_result()]

Segments an object in an image using SAM 2.

Accepts an optional point or box prompt to select which object to segment. With no prompt, the centre of the image is used.

Arguments

image is any Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

:prompt selects what to segment:
- :auto — segment the object at the image centre (default).
- {:point, x, y} — segment the object at pixel (x, y).
- {:box, x, y, w, h} — segment the object inside the box.
- A list of {:point, x, y} tuples for multi-point prompting.
:multimask — when true, returns all three SAM candidate masks as a list sorted by descending score. When false (default), returns only the best mask as a single mask_result/0.
:min_score — minimum IoU score to return when :multimask is true. The default is 0.0.
:repo — HuggingFace repo for the SAM 2 ONNX models. Default is "SharpAI/sam2-hiera-tiny-onnx".
:encoder_file — encoder ONNX filename within the repo. Default is "encoder.onnx".
:decoder_file — decoder ONNX filename within the repo. Default is "decoder.onnx".

Returns

A mask_result/0 map when :multimask is false.
A list of mask_result/0 maps sorted by descending :score when :multimask is true.

Examples

iex> image = Image.open!("./test/support/images/puppy.webp")
iex> %{score: score, mask: mask} = Image.Segmentation.segment(image)
iex> score > 0.0
true
iex> match?(%Vix.Vips.Image{}, mask)
true

segment_panoptic(image, options \\ [])

@spec segment_panoptic(Vix.Vips.Image.t(), Keyword.t()) :: [segment()]

Segments and labels every region in an image using DETR-panoptic.

Returns one segment per detected object or region, each with a class label, confidence score, and a binary mask. Covers 250 COCO panoptic categories including everyday objects (person, car, dog) and background regions (sky, road, grass).

Arguments

image is any Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

:min_score — minimum confidence score to include a segment. The default is 0.5.
:repo — HuggingFace repo for the ONNX model. Default is "Xenova/detr-resnet-50-panoptic".
:model_file — ONNX filename within the repo. Default is "onnx/model.onnx". Use "onnx/model_quantized.onnx" (~44 MB) for a much smaller model with some accuracy loss.

Returns

A list of segment/0 maps sorted by descending :score. May be empty if no segment meets :min_score.

Examples

iex> image = Image.open!("./test/support/images/puppy.webp")
iex> segments = Image.Segmentation.segment_panoptic(image)
iex> is_list(segments)
true
iex> Enum.all?(segments, &match?(%{label: _, score: _, mask: _}, &1))
true