# `Image.Captioning`
[🔗](https://github.com/elixir-image/image_vision/blob/v0.2.0/lib/captioning.ex#L2)

Image captioning — generates a natural-language description of an
image.

Pass a `t:Vix.Vips.Image.t/0` to `caption/2` and get back a string
like `"a small dog sitting on a wooden floor"` or `"a man riding
a horse with a bird of prey"`.

## Quick start

    # The captioner serving is heavyweight and not autostarted by
    # default. Either set `autostart: true` in config (see below)
    # or add the child spec to your own supervision tree:
    #
    #     children = [Image.Captioning.captioner()]

    iex> _puppy = Image.open!("./test/support/images/puppy.webp")
    iex> # Image.Captioning.caption(puppy)
    iex> # => "a brown and white puppy sitting on a white surface"

## Default model

[`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base)
— BSD-3-Clause licensed, ~990 MB. The base BLIP variant fine-tuned
for image captioning. Solid baseline quality across general subject
matter.

Note that this is by far the heaviest of the library's default
models — the first call (or first app boot with `autostart: true`)
blocks on a ~990 MB download from HuggingFace.

## Configuration

Configure in `config/runtime.exs`:

    config :image_vision, :captioner,
      model: {:hf, "Salesforce/blip-image-captioning-base"},
      featurizer: {:hf, "Salesforce/blip-image-captioning-base"},
      tokenizer: {:hf, "Salesforce/blip-image-captioning-base"},
      generation_config: {:hf, "Salesforce/blip-image-captioning-base"},
      model_options: [],
      featurizer_options: [],
      tokenizer_options: [],
      generation_config_options: [],
      batch_size: 1,
      name: Image.Captioning.Server,
      autostart: false

To use the larger and higher-quality variant:

    config :image_vision, :captioner,
      model: {:hf, "Salesforce/blip-image-captioning-large"},
      featurizer: {:hf, "Salesforce/blip-image-captioning-large"},
      tokenizer: {:hf, "Salesforce/blip-image-captioning-large"},
      generation_config: {:hf, "Salesforce/blip-image-captioning-large"}

## Servings and supervision

BLIP is a multi-module model (vision encoder, text decoder,
cross-attention) and a load takes several seconds. The captioning
entry point runs against a named serving process so the model
loads once and is reused.

The serving is not autostarted by default — most apps either don't
need captioning at all or want explicit control over when the
download happens. To run it in your own supervision tree:

    # application.ex
    def start(_type, _args) do
      children = [Image.Captioning.captioner()]
      Supervisor.start_link(children, strategy: :one_for_one)
    end

Or set `autostart: true` to have `ImageVision.Supervisor` start it
when the `:image_vision` application starts.

## Optional dependency

This module is only available when [Bumblebee](https://hex.pm/packages/bumblebee),
[Nx](https://hex.pm/packages/nx), and an Nx compiler such as
[EXLA](https://hex.pm/packages/exla) are configured in your
application's `mix.exs`.

# `caption`

```elixir
@spec caption(image :: Vix.Vips.Image.t(), options :: Keyword.t()) ::
  String.t() | {:error, Image.error()}
```

Generates a natural-language caption for an image.

### Arguments

* `image` is any `t:Vix.Vips.Image.t/0`.

* `options` is a keyword list of options.

### Options

* `:backend` is any valid Nx backend used for the image-to-tensor
  conversion. The default is `Nx.default_backend/0`.

* `:server` is the name of the captioning serving process. The
  default is `Image.Captioning.Server`.

### Returns

* The caption as a `t:String.t/0`, or

* `{:error, reason}` if the input could not be processed.

### Examples

    iex> _puppy = Image.open!("./test/support/images/puppy.webp")
    iex> # Image.Captioning.caption(puppy)
    iex> # => "a small dog sitting on a wooden surface"

# `captioner`

```elixir
@spec captioner(configuration :: Keyword.t()) ::
  {Nx.Serving, Keyword.t()} | {:error, Image.error()}
```

Returns a child spec suitable for starting an image captioning
process as part of a supervision tree.

### Arguments

* `configuration` is a keyword list merged over the default
  configuration.

### Options

* `:model` is any BLIP-family image captioning model supported by
  Bumblebee. The default is `{:hf, "Salesforce/blip-image-captioning-base"}`.

* `:featurizer` is the BLIP featurizer. The default is
  `{:hf, "Salesforce/blip-image-captioning-base"}`.

* `:tokenizer` is the BLIP tokenizer. The default is
  `{:hf, "Salesforce/blip-image-captioning-base"}`.

* `:generation_config` is a Bumblebee generation config repo. The
  default is `{:hf, "Salesforce/blip-image-captioning-base"}`.

* `:model_options`, `:featurizer_options`, `:tokenizer_options`,
  and `:generation_config_options` are keyword lists passed to
  the corresponding `Bumblebee.load_*` functions. Defaults are `[]`.

* `:name` is the name of the serving process. The default is
  `Image.Captioning.Server`.

* `:batch_size` is the maximum batch size. The default is `1`.

### Returns

* A child spec tuple suitable for `Supervisor.start_link/2`, or

* `{:error, reason}` if the model could not be loaded.

---

*Consult [api-reference.md](api-reference.md) for complete listing*