View Source Bumblebee.Vision (Bumblebee v0.4.2)

High-level tasks related to vision.

Summary

Types

@type image() :: Nx.Container.t()

A term representing an image.

Either Nx.Tensor or a struct implementing Nx.Container and resolving to a tensor, with the following properties:

  • HWC order
  • RGB color channels
  • alpha channel may be present, but it's usually stripped out
  • integer type (:s or :u)
Link to this type

image_classification_input()

View Source
@type image_classification_input() :: image()
Link to this type

image_classification_output()

View Source
@type image_classification_output() :: %{
  predictions: [image_classification_prediction()]
}
Link to this type

image_classification_prediction()

View Source
@type image_classification_prediction() :: %{score: number(), label: String.t()}
Link to this type

image_embedding_input()

View Source
@type image_embedding_input() :: image()
Link to this type

image_embedding_output()

View Source
@type image_embedding_output() :: %{embedding: Nx.Tensor.t()}
@type image_to_text_input() :: image()
Link to this type

image_to_text_output()

View Source
@type image_to_text_output() :: %{results: [image_to_text_result()]}
Link to this type

image_to_text_result()

View Source
@type image_to_text_result() :: %{text: String.t()}

Functions

Link to this function

image_classification(model_info, featurizer, opts \\ [])

View Source
@spec image_classification(
  Bumblebee.model_info(),
  Bumblebee.Featurizer.t(),
  keyword()
) :: Nx.Serving.t()

Builds serving for image classification.

The serving accepts image_classification_input/0 and returns image_classification_output/0. A list of inputs is also supported.

Options

  • :top_k - the number of top predictions to include in the output. If the configured value is higher than the number of labels, all labels are returned. Defaults to 5

  • :compile - compiles all computations for predefined input shapes during serving initialization. Should be a keyword list with the following keys:

    • :batch_size - the maximum batch size of the input. Inputs are optionally padded to always match this batch size

    It is advised to set this option in production and also configure a defn compiler using :defn_options to maximally reduce inference time.

  • :scores_function - the function to use for converting logits to scores. Should be one of :softmax, :sigmoid, or :none. Defaults to :softmax

  • :defn_options - the options for JIT compilation. Defaults to []

  • :preallocate_params - when true, explicitly allocates params on the device configured by :defn_options. You may want to set this option when using partitioned serving, to allocate params on each of the devices. Defaults to false

Examples

{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

serving = Bumblebee.Vision.image_classification(resnet, featurizer)

image = StbImage.read_file!(path)
Nx.Serving.run(serving, image)
#=> %{
#=>   predictions: [
#=>     %{label: "Egyptian cat", score: 0.979233980178833},
#=>     %{label: "tabby, tabby cat", score: 0.00679466687142849},
#=>     %{label: "tiger cat", score: 0.005290505941957235},
#=>     %{label: "lynx, catamount", score: 0.004550771787762642},
#=>     %{label: "Siamese cat, Siamese", score: 1.1611092486418784e-4}
#=>   ]
#=> }
Link to this function

image_embedding(model_info, featurizer, opts \\ [])

View Source
@spec image_embedding(
  Bumblebee.model_info(),
  Bumblebee.Featurizer.t(),
  keyword()
) :: Nx.Serving.t()

Builds serving for image embeddings.

The serving accepts image_embedding_input/0 and returns image_embedding_output/0. A list of inputs is also supported.

Options

  • :output_attribute - the attribute of the model output map to retrieve. When the output is a single tensor (rather than a map), this option is ignored. Defaults to :pooled_state

  • :embedding_processor - a post-processing step to apply to the embedding. Supported values: :l2_norm. By default the output is returned as is

  • :compile - compiles all computations for predefined input shapes during serving initialization. Should be a keyword list with the following keys:

    • :batch_size - the maximum batch size of the input. Inputs are optionally padded to always match this batch size

    It is advised to set this option in production and also configure a defn compiler using :defn_options to maximally reduce inference time.

  • :defn_options - the options for JIT compilation. Defaults to []

  • :preallocate_params - when true, explicitly allocates params on the device configured by :defn_options. You may want to set this option when using partitioned serving, to allocate params on each of the devices. Defaults to false

Examples

{:ok, clip} =
  Bumblebee.load_model({:hf, "openai/clip-vit-base-patch32"},
    module: Bumblebee.Vision.ClipVision
  )
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/clip-vit-base-patch32"})
serving = Bumblebee.Vision.ImageEmbedding.image_embedding(clip, featurizer)
image = StbImage.read_file!(path)
Nx.Serving.run(serving, image)
#=> %{
#=>   embedding: #Nx.Tensor<
#=>     f32[768]
#=>     [-0.43403682112693787, 0.09786412119865417, -0.7233262062072754, -0.7707743644714355, 0.5550824403762817, -0.8923342227935791, 0.2687447965145111, 0.9633643627166748, 0.3520320951938629, 0.43195801973342896, 2.1438512802124023, -0.6542983651161194, -1.9736307859420776, 0.1611439287662506, 0.24555791914463043, 0.16985465586185455, 0.9012499451637268, 1.0657984018325806, 1.087411642074585, -0.5864712595939636, 0.3314521908760071, 0.8396108150482178, 0.3906593322753906, 0.13463366031646729, 0.2605385184288025, -0.07457947731018066, 0.4735124707221985, -0.41367805004119873, 0.18244807422161102, 1.4741417169570923, -5.807061195373535, 0.38920706510543823, 0.057687126100063324, 0.060301072895526886, 0.9680367708206177, 0.9670255184173584, 1.3876476287841797, -0.15498873591423035, -0.969764232635498, -0.38127464056015015, 0.05450016260147095, 2.2317700386047363, -0.07926210761070251, -0.11876475065946579, -1.5408644676208496, 0.7505669593811035, 0.9280041456222534, -0.3571934103965759, -1.1390857696533203, ...]
#=>   >
#=> }
Link to this function

image_to_text(model_info, featurizer, tokenizer, generation_config, opts \\ [])

View Source

Builds serving for image-to-text generation.

The serving accepts image_to_text_input/0 and returns image_to_text_output/0. A list of inputs is also supported.

Options

  • :seed - random seed to use when sampling. By default the current timestamp is used

  • :compile - compiles all computations for predefined input shapes during serving initialization. Should be a keyword list with the following keys:

    • :batch_size - the maximum batch size of the input. Inputs are optionally padded to always match this batch size

    It is advised to set this option in production and also configure a defn compiler using :defn_options to maximally reduce inference time.

  • :defn_options - the options for JIT compilation. Defaults to []

  • :preallocate_params - when true, explicitly allocates params on the device configured by :defn_options. You may want to set this option when using partitioned serving, to allocate params on each of the devices. Defaults to false

Examples

{:ok, blip} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})

{:ok, generation_config} =
  Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})

serving =
  Bumblebee.Vision.image_to_text(blip, featurizer, tokenizer, generation_config,
    defn_options: [compiler: EXLA]
  )

image = StbImage.read_file!(path)
Nx.Serving.run(serving, image)
#=> %{results: [%{text: "a cat sitting on a chair"}]}