Image.Classification (image_vision v0.2.0)

Copy Markdown View Source

Image classification — what's in this image?

Pass a Vix.Vips.Image.t/0 to classify/2 or labels/2 and get back human-readable labels like "sports car" or "Blenheim spaniel". Pass it to embed/2 to get a fixed-size feature vector you can use for similarity search or downstream learning.

Quick start

iex> puppy = Image.open!("./test/support/images/puppy.webp")
iex> [label | _rest] = Image.Classification.labels(puppy)
iex> label
"Blenheim spaniel"

Default models

The defaults are chosen for permissive licensing (Apache 2.0), reasonable size (<400 MB), and broad applicability:

  • Classificationfacebook/convnext-tiny-224. ~110 MB, ~82.1% top-1 ImageNet, Apache 2.0. Returns one of 1000 ImageNet labels with a confidence score.

  • Embeddingfacebook/dinov2-base. ~340 MB, Apache 2.0. Returns a 768-dim feature vector. Useful for "find similar images", clustering, or as input to a custom classifier.

Power users can override every default through configuration or classifier/1 options — see the configuration section below.

Configuration

Both classifier and embedder are configurable independently. The defaults are:

# config/runtime.exs
config :image_vision, :classifier,
  model: {:hf, "facebook/convnext-tiny-224"},
  featurizer: {:hf, "facebook/convnext-tiny-224"},
  model_options: [],
  featurizer_options: [],
  batch_size: 10,
  name: Image.Classification.Server,
  autostart: true

config :image_vision, :embedder,
  model: {:hf, "facebook/dinov2-base"},
  featurizer: {:hf, "facebook/dinov2-base"},
  model_options: [],
  featurizer_options: [],
  batch_size: 10,
  name: Image.Classification.EmbeddingServer,
  autostart: false

Servings and supervision

Bumblebee servings are heavyweight processes — a model load can take several seconds and consume hundreds of megabytes. Each classification or embedding entry point runs against a named serving process so the model loads once and is reused.

By default the classifier serving is autostarted by ImageVision.Supervisor when the :image_vision application starts. The embedding serving is not autostarted (most apps don't need it).

To run a serving in your own supervision tree, set autostart: false and use classifier/1 or embedder/1 to get a child spec:

# application.ex
def start(_type, _args) do
  children = [
    Image.Classification.classifier(),
    Image.Classification.embedder(model: {:hf, "facebook/dinov2-large"})
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end

Optional dependency

This module is only available when Bumblebee, Nx, and an Nx compiler such as EXLA are configured in your application's mix.exs.

Summary

Functions

Returns a child spec suitable for starting an image classification process as part of a supervision tree.

Classifies an image and returns the full prediction map.

Computes a feature vector embedding of an image.

Returns a child spec suitable for starting an image embedding process as part of a supervision tree.

Classifies an image and returns the labels that meet a minimum confidence score.

Functions

classifier(configuration \\ Application.get_env(:image_vision, :classifier, []))

@spec classifier(configuration :: Keyword.t()) ::
  {Nx.Serving, Keyword.t()} | {:error, Image.error()}

Returns a child spec suitable for starting an image classification process as part of a supervision tree.

Arguments

  • configuration is a keyword list merged over the default configuration.

Options

  • :model is any image classification model supported by Bumblebee. The default is {:hf, "facebook/convnext-tiny-224"}.

  • :featurizer is any image featurizer supported by Bumblebee. The default is {:hf, "facebook/convnext-tiny-224"}.

  • :model_options is a keyword list of options passed to Bumblebee.load_model/2. The default is [].

  • :featurizer_options is a keyword list of options passed to Bumblebee.load_featurizer/2. The default is [].

  • :name is the name of the serving process. The default is Image.Classification.Server.

  • :batch_size is the maximum batch size, passed to Bumblebee.Vision.image_classification/3. The default is 10.

Returns

classify(image, options \\ [])

@spec classify(image :: Vix.Vips.Image.t(), Keyword.t()) ::
  %{predictions: [%{label: String.t(), score: float()}]}
  | {:error, Image.error()}

Classifies an image and returns the full prediction map.

Arguments

Options

  • :backend is any valid Nx backend. The default is Nx.default_backend/0.

  • :server is the name of the serving process. The default is Image.Classification.Server.

Returns

  • A map of the form %{predictions: [%{label: String.t(), score: float()}]}, or

  • {:error, reason}.

Examples

iex> puppy = Image.open!("./test/support/images/puppy.webp")
iex> %{predictions: [%{label: _label, score: _score} | _rest]} =
...>   Image.Classification.classify(puppy)

embed(image, options \\ [])

@spec embed(image :: Vix.Vips.Image.t(), options :: Keyword.t()) ::
  Nx.Tensor.t() | {:error, Image.error()}

Computes a feature vector embedding of an image.

Embeddings are fixed-size dense vectors. Two images with similar visual content will have similar embeddings, making this useful for similarity search, clustering, or as input to a downstream classifier.

Arguments

Options

  • :backend is any valid Nx backend. The default is Nx.default_backend/0.

  • :server is the name of the embedding serving process. The default is Image.Classification.EmbeddingServer.

Returns

  • An Nx.Tensor of shape {embedding_size} (e.g. {768} for DINOv2-base), or

  • {:error, reason}.

Examples

iex> puppy = Image.open!("./test/support/images/puppy.webp")
iex> embedding = Image.Classification.embed(puppy)
iex> Nx.shape(embedding)
{768}

embedder(configuration \\ Application.get_env(:image_vision, :embedder, []))

@spec embedder(configuration :: Keyword.t()) ::
  {Nx.Serving, Keyword.t()} | {:error, Image.error()}

Returns a child spec suitable for starting an image embedding process as part of a supervision tree.

Embeddings are fixed-size feature vectors useful for similarity search, clustering, or as input to a downstream classifier.

Arguments

  • configuration is a keyword list merged over the default configuration.

Options

  • :model is any image embedding model supported by Bumblebee. The default is {:hf, "facebook/dinov2-base"}.

  • :featurizer is any image featurizer supported by Bumblebee. The default is {:hf, "facebook/dinov2-base"}.

  • :model_options is a keyword list of options passed to Bumblebee.load_model/2. The default is [].

  • :featurizer_options is a keyword list of options passed to Bumblebee.load_featurizer/2. The default is [].

  • :name is the name of the serving process. The default is Image.Classification.EmbeddingServer.

  • :batch_size is the maximum batch size. The default is 10.

Returns

labels(image, options \\ [])

@spec labels(image :: Vix.Vips.Image.t(), options :: Keyword.t()) ::
  [String.t()] | {:error, Image.error()}

Classifies an image and returns the labels that meet a minimum confidence score.

Arguments

Options

  • :backend is any valid Nx backend. The default is Nx.default_backend/0.

  • :min_score is the minimum score, a float between 0.0 and 1.0, that a label must meet to be returned. The default is 0.5.

  • :server is the name of the serving process. The default is Image.Classification.Server.

Returns

  • A list of labels. The list may be empty if no prediction meets :min_score.

  • {:error, reason}.

Examples

iex> car = Image.open!("./test/support/images/lamborghini-forsennato-concept.jpg")
iex> Image.Classification.labels(car)
["sports car", "sport car"]