Image Classification

Copy Markdown View Source

Image.Classification answers "what's in this image?" and "how similar are these two images?".

Getting labels

The simplest entry point is labels/2. It returns a list of human-readable labels for whatever is most prominent in the image:

iex> puppy = Image.open!("puppy.jpg")
iex> Image.Classification.labels(puppy)
["Blenheim spaniel"]

iex> car = Image.open!("lamborghini.jpg")
iex> Image.Classification.labels(car)
["sports car", "sport car"]

Labels come from the model's training dataset (ImageNet-1k for the default ConvNeXt model — 1000 everyday categories). The default minimum confidence threshold is 0.5; adjust with :min_score:

iex> Image.Classification.labels(puppy, min_score: 0.8)
["Blenheim spaniel"]

iex> Image.Classification.labels(puppy, min_score: 0.1)
["Blenheim spaniel", "cocker spaniel", "papillon"]

Getting raw predictions with scores

classify/2 returns the full prediction map including scores:

iex> %{predictions: preds} = Image.Classification.classify(puppy)
iex> hd(preds)
%{label: "Blenheim spaniel", score: 0.9327}

Computing embeddings

embed/2 returns a 768-dimensional feature vector. Vectors for visually similar images will be close together in this space — useful for "find images like this one" or feeding into a custom classifier.

iex> v1 = Image.Classification.embed(puppy)
iex> v2 = Image.Classification.embed(other_puppy)

# Cosine similarity: closer to 1.0 = more similar
iex> cos_sim = Nx.dot(v1, v2) / (Nx.norm(v1) * Nx.norm(v2)) |> Nx.to_number()
0.97

Configuration

The classification serving is autostarted when the :image_vision application starts. To use a larger, more accurate model, set it in config/runtime.exs:

config :image_vision, :classifier,
  model: {:hf, "facebook/convnext-large-224-22k-1k"},
  featurizer: {:hf, "facebook/convnext-large-224-22k-1k"}

The embedder is configured the same way, independently:

config :image_vision, :embedder,
  model: {:hf, "facebook/dinov2-large"},
  featurizer: {:hf, "facebook/dinov2-large"}

Any image classification or image embedding model that Bumblebee can load works as a drop-in replacement — anything from the HuggingFace image-classification or feature-extraction catalogue with a corresponding featurizer. Larger models trade speed and memory for accuracy. The label set will be whatever the chosen model was trained on — for example, convnext-large-224-22k-1k returns the same 1000 ImageNet labels as the default, while a model fine-tuned on a different dataset returns that dataset's labels.

To manage the serving yourself (e.g. in an umbrella app):

# config/runtime.exs
config :image_vision, :classifier, autostart: false

# application.ex
children = [Image.Classification.classifier()]

Configuration changes take effect at application start. After editing config/runtime.exs, restart the application; the new model is downloaded on first call and cached.

To pre-download a configured model so first-call latency is eliminated:

mix image_vision.download_models --classify

Dependencies

Classification requires :bumblebee, :nx, and an Nx backend such as :exla. Add to mix.exs:

{:bumblebee, "~> 0.6"},
{:exla, "~> 0.9"},