View Source Bumblebee (Bumblebee v0.5.3)

Pre-trained Axon models for easy inference and boosted training.

Bumblebee provides state-of-the-art, configurable Axon models. On top of that, it streamlines the process of loading pre-trained models by integrating with Hugging Face Hub and 🤗 Transformers.

Usage

You can load one of the supported models by specifying the model repository:

{:ok, model_info} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

Then you are ready to make predictions:

inputs = Bumblebee.apply_tokenizer(tokenizer, "Hello Bumblebee!")
outputs = Axon.predict(model_info.model, model_info.params, inputs)

Tasks

On top of bare models, Bumblebee provides a number of "servings" that act as end-to-end pipelines for specific tasks.

serving = Bumblebee.Text.fill_mask(model_info, tokenizer)
Nx.Serving.run(serving, "The capital of [MASK] is Paris.")
#=> %{
#=>   predictions: [
#=>     %{score: 0.9279842972755432, token: "france"},
#=>     %{score: 0.008412551134824753, token: "brittany"},
#=>     %{score: 0.007433671969920397, token: "algeria"},
#=>     %{score: 0.004957548808306456, token: "department"},
#=>     %{score: 0.004369721747934818, token: "reunion"}
#=>   ]
#=> }

As you can see the serving takes care of pre-processing the text input, runs the model and also post-processes its output into more structured data. In the above example we run serving on the fly, however for production usage you can start serving as a process and it will automatically batch requests from multiple clients. Processing inputs in batches is usually much more efficient, since it can take advantage of parallel capabilities of the target device, which is particularly relevant in case of GPU. For more details read the Nx.Serving docs.

For more examples see the Examples notebook.

Note
The models are generally large, so make sure to configure an efficient Nx backend, such as EXLA or Torchx.

HuggingFace Hub

HuggingFace Hub is a platform hosting models, datasets and demo apps (Spaces), all using Git repositories (with Git LFS for large files). For further information check out the Hub documentation and explore the model repositories.

Models

Model repositories are regular Git repositories, therefore they can store arbitrary files. However, most repositories store models saved using the Python Transformers library. Bumblebee is an Elixir counterpart of Transformers and allows for importing those models, as long as they are implemented in Bumblebee.

A repository in the Transformers format does not store an actual model, only the trained parameters and a configuration file. The configuration file specifies the model type (e.g. BERT) and high-level properties, such as the number layers and their size. The model implementation lives in the library code (both Transformers and Bumblebee). When loading a model, the library fetches the configuration and builds a matching model, then it fetches the trained parameters to pair them with the model. The key takeaway is that in order to use any given model, it needs to have an implementation in Bumblebee.

Model repository

Here is a list of files commonly found in a repository following the Transformers format.

config.json - model configuration, specifies the model type and model-specific options. You can think of this as a blueprint for how the model should be constructed
pytorch_model.bin - raw model parameters (tensors) serialized from a PyTorch model using PyTorch format (supported by Bumblebee)
model.safetensors - raw model parameters (tensors) serialized from a PyTorch model using Safetensors (supported by Bumblebee)
flax_model.msgpack, tf_model.h5 - raw model parameters (tensors) serialized from Flax and Tensorflow models respectively (not supported by Bumblebee)
tokenizer.json, tokenizer_config.json - tokenizer configuration, describes how to convert text input to model inputs (tensors). See Tokenizer support
preprocessor_config.json - featurizer configuration, describes how to convert real-world input (image, audio) to model inputs (tensors)
generation_config.json - a set of configuration options specific to text generation, such as token sampling strategy and various constraints

Model support

As pointed out above, in order to load a model, the given model type must be implemented in Bumblebee. To find out whether the model is supported you can call Bumblebee.load_model({:hf, "model-repo"}) or use this tool to run a number of checks against the repository.

If you prefer to poke around the code, open the config.json file in the model repository and copy the class name under "architectures". Next, search Bumblebee codebase for that keyword. If you find a match, this indicates the model is supported.

Also note that certain repositories include multiple models in separate repositories, for example stabilityai/stable-diffusion-2. In such case use Bumblebee.load_model({:hf, "model-repo", subdir: "..."}).

Tokenizer support

The Transformers library distinguishes two types of tokenizer implementations:

"slow tokenizer" - a tokenizer implemented in Python and stored as tokenizer_config.json and a couple extra files
"fast tokenizer" - a tokenizer implemented in Rust and stored in a single file - tokenizer.json

Bumblebee relies on the Rust implementations (through bindings to Tokenizers) and therefore always requires the tokenizer.json file. Many repositories only include files for a "slow tokenizer". When you stumble upon such repository, there are two options you can try.

First, if the repository is clearly a fine-tuned version of another model, you can look for tokenizer.json in the original model repository. For example, textattack/bert-base-uncased-yelp-polarity only includes tokenizer_config.json, but it is a fine-tuned version of bert-base-uncased, which does include tokenizer.json. Consequently, you can safely load the model from textattack/bert-base-uncased-yelp-polarity and tokenizer from bert-base-uncased.

Otherwise, the Transformers library includes conversion rules to load a "slow tokenizer" and convert it to a corresponding "fast tokenizer", which is possible in most cases. You can generate the tokenizer.json file using this tool. Once successful, you can follow the steps to submit a PR adding tokenizer.json to the model repository. Note that you do not have to wait for the PR to be merged, instead you can copy commit SHA from the PR and load the tokenizer with Bumblebee.load_tokenizer({:hf, "model-repo", revision: "..."}).

Summary

Models

build_model(spec, opts \\ [])

Builds an Axon model according to the given specification.

load_model(repository, opts \\ [])

Loads a pre-trained model from a model repository.

load_spec(repository, opts \\ [])

Loads model specification from a model repository.

Featurizers

apply_featurizer(featurizer, input, opts \\ [])

Featurizes input with the given featurizer.

load_featurizer(repository, opts \\ [])

Loads featurizer from a model repository.

Tokenizers

apply_tokenizer(tokenizer, input, opts \\ [])

Tokenizes and encodes input with the given tokenizer.

load_tokenizer(repository, opts \\ [])

Loads tokenizer from a model repository.

Schedulers

load_scheduler(repository, opts \\ [])

Loads scheduler from a model repository.

scheduler_init(scheduler, num_steps, sample_template, prng_key)

Initializes state for a new scheduler loop.

scheduler_step(scheduler, state, sample, prediction)

Predicts sample at the previous timestep using the given scheduler.

Types

model_info()

A model together with its state and metadata.

repository()

A location to fetch model files from.

Functions

cache_dir()

Returns the directory where downloaded files are stored.

configure(config, options \\ [])

Builds or updates a configuration object with the given options.

load_generation_config(repository, opts \\ [])

Loads generation config from a model repository.

Models

build_model(spec, opts \\ [])

@spec build_model(
  Bumblebee.ModelSpec.t(),
  keyword()
) :: Axon.t()

Builds an Axon model according to the given specification.

Options

:type - either a type or Axon.MixedPrecision policy to apply to the model

Example

spec = Bumblebee.configure(Bumblebee.Vision.ResNet, architecture: :base, embedding_size: 128)
model = Bumblebee.build_model(spec)

load_model(repository, opts \\ [])

@spec load_model(
  repository(),
  keyword()
) :: {:ok, model_info()} | {:error, String.t()}

Loads a pre-trained model from a model repository.

The model is downloaded and cached on your disk, use cache_dir/0 to find the location.

Parameters precision

On GPUs computations that use numeric type of lower precision can be faster and use less memory, while still providing valid results. You can configure the model to use particular type by passing the :type option, such as :bf16.

Some repositories have multiple variants of the parameter files with different numeric types. The variant is usually indicated in the file extension and you can load a particular file by specifying :params_variant, or :params_filename. Note however that this does not determine the numeric type used for inference. The file type is relevant in context of download bandwidth and disk space. If you want to use a lower precision for inference, make sure to also specify :type.

Options

:spec - the model specification to use when building the model. By default the specification is loaded using load_spec/2
:spec_overrides - additional options to configure the model specification with. This is a shorthand for using load_spec/2, configure/2 and passing as :spec
:module - the model specification module. By default it is inferred from the configuration file, if that is not possible, it must be specified explicitly
:architecture - the model architecture, must be supported by :module. By default it is inferred from the configuration file
:params_variant - when specified, instead of loading parameters from "<name>.<ext>", loads from "<name>.<variant>.<ext>"
:params_filename - the file with the model parameters to be loaded
:log_params_diff - whether to log missing, mismatched and unused parameters. By default diff is logged only if some parameters cannot be loaded
:backend - the backend to allocate the tensors on. It is either an atom or a tuple in the shape {backend, options}
:type - either a type or Axon.MixedPrecision policy to apply to the model. Passing this option automatically casts parameters to the desired type

Examples

By default the model type is inferred from configuration, so loading is as simple as:

{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
%{model: model, params: params, spec: spec} = resnet

You can explicitly specify a different architecture, in which case matching parameters are still loaded:

{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}, architecture: :base)

To further customize the model, you can also pass the specification:

{:ok, spec} = Bumblebee.load_spec({:hf, "microsoft/resnet-50"})
spec = Bumblebee.configure(spec, num_labels: 10)
{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}, spec: spec)

Or as a shorthand, you can pass just the options to override:

{:ok, resnet} =
  Bumblebee.load_model({:hf, "microsoft/resnet-50"}, spec_overrides: [num_labels: 10])

load_spec(repository, opts \\ [])

@spec load_spec(
  repository(),
  keyword()
) :: {:ok, Bumblebee.ModelSpec.t()} | {:error, String.t()}

Loads model specification from a model repository.

Options

:module - the model specification module. By default it is inferred from the configuration file, if that is not possible, it must be specified explicitly
:architecture - the model architecture, must be supported by :module. By default it is inferred from the configuration file

Examples

{:ok, spec} = Bumblebee.load_spec({:hf, "microsoft/resnet-50"})

You can explicitly specify a different architecture:

{:ok, spec} = Bumblebee.load_spec({:hf, "microsoft/resnet-50"}, architecture: :base)

Featurizers

apply_featurizer(featurizer, input, opts \\ [])

@spec apply_featurizer(Bumblebee.Featurizer.t(), any(), keyword()) :: any()

Featurizes input with the given featurizer.

Options

:defn_options - the options for JIT compilation. Note that this is only relevant for featurizers implemented with Nx. Defaults to []

Examples

featurizer = Bumblebee.configure(Bumblebee.Vision.ConvNextFeaturizer)
{:ok, img} = StbImage.read_file(path)
inputs = Bumblebee.apply_featurizer(featurizer, [img])

load_featurizer(repository, opts \\ [])

@spec load_featurizer(
  repository(),
  keyword()
) :: {:ok, Bumblebee.Featurizer.t()} | {:error, String.t()}

Loads featurizer from a model repository.

Options

:module - the featurizer module. By default it is inferred from the preprocessor configuration file, if that is not possible, it must be specified explicitly

Examples

{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

Tokenizers

apply_tokenizer(tokenizer, input, opts \\ [])

@spec apply_tokenizer(
  Bumblebee.Tokenizer.t(),
  Bumblebee.Tokenizer.input() | [Bumblebee.Tokenizer.input()],
  keyword()
) :: any()

Tokenizes and encodes input with the given tokenizer.

Examples

tokenizer = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})
inputs = Bumblebee.apply_tokenizer(tokenizer, ["The capital of France is [MASK]."])

load_tokenizer(repository, opts \\ [])

@spec load_tokenizer(
  repository(),
  keyword()
) :: {:ok, Bumblebee.Tokenizer.t()} | {:error, String.t()}

Loads tokenizer from a model repository.

Options

:type - the tokenizer type. By default it is inferred from the configuration files, if that is not possible, it must be specified explicitly

Examples

{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

Schedulers

load_scheduler(repository, opts \\ [])

@spec load_scheduler(
  repository(),
  keyword()
) :: {:ok, Bumblebee.Scheduler.t()} | {:error, String.t()}

Loads scheduler from a model repository.

Options

:module - the scheduler module. By default it is inferred from the scheduler configuration file, if that is not possible, it must be specified explicitly

Examples

{:ok, scheduler} =
  Bumblebee.load_scheduler({:hf, "CompVis/stable-diffusion-v1-4", subdir: "scheduler"})

scheduler_init(scheduler, num_steps, sample_template, prng_key)

@spec scheduler_init(
  Bumblebee.Scheduler.t(),
  non_neg_integer(),
  Nx.Tensor.t(),
  Nx.Tensor.t()
) :: {Bumblebee.Scheduler.state(), Nx.Tensor.t()}

Initializes state for a new scheduler loop.

Returns a pair of {state, timesteps}, where state is an opaque container expected by scheduler_step/4 and timesteps is a sequence of subsequent timesteps for model forward pass.

Note that the number of timesteps may not match num_steps exactly. num_steps parameterizes sampling points, however depending on the method, sampling certain points may require multiple forward passes of the model and each element in timesteps corresponds to a single forward pass.

scheduler_step(scheduler, state, sample, prediction)

@spec scheduler_step(
  Bumblebee.Scheduler.t(),
  Bumblebee.Scheduler.state(),
  Nx.Tensor.t(),
  Nx.Tensor.t()
) :: {Bumblebee.Scheduler.state(), Nx.Tensor.t()}

Predicts sample at the previous timestep using the given scheduler.

Takes the current sample and prediction (usually noise) returned by the model at the current timestep. Returns {state, prev_sample}, where state is the updated scheduler loop state and prev_sample is the predicted sample at the previous timestep.

Note that some schedulers require several forward passes of the model (and a couple calls to this function) to make an actual prediction for the previous sample.

Types

model_info()

@type model_info() :: %{model: Axon.t(), params: map(), spec: Bumblebee.ModelSpec.t()}

A model together with its state and metadata.

repository()

@type repository() ::
  {:hf, String.t()} | {:hf, String.t(), keyword()} | {:local, Path.t()}

A location to fetch model files from.

Can be either:

{:hf, repository_id} - the repository on Hugging Face. Options may be passed as the third element:
- :revision - the specific model version to use, it can be any valid git identifier, such as branch name, tag name, or a commit hash
- :cache_dir - the directory to store the downloaded files in. Defaults to the standard cache location for the given operating system. You can also configure it globally by setting the BUMBLEBEE_CACHE_DIR environment variable
- :offline - if true, only cached files are accessed and missing files result in an error. You can also configure it globally by setting the BUMBLEBEE_OFFLINE environment variable to true
- :auth_token - the token to use as HTTP bearer authorization for remote files
- :subdir - the directory within the repository where the files are located
{:local, directory} - the directory containing model files

Functions

cache_dir()

@spec cache_dir() :: String.t()

Returns the directory where downloaded files are stored.

Defaults to the standard cache location for the given operating system. Can be configured with the BUMBLEBEE_CACHE_DIR environment variable.

configure(config, options \\ [])

@spec configure(
  module() | Bumblebee.Configurable.t(),
  keyword()
) :: Bumblebee.Configurable.t()

Builds or updates a configuration object with the given options.

Expects a configuration struct or a module supporting configuration. These are usually configurable:

model specification (Bumblebee.ModelSpec)
featurizer (Bumblebee.Featurizer)
scheduler (Bumblebee.Scheduler)

Examples

To build a new configuration, pass a module:

featurizer = Bumblebee.configure(Bumblebee.Vision.ConvNextFeaturizer)
spec = Bumblebee.configure(Bumblebee.Vision.ResNet, architecture: :for_image_classification)

Similarly, you can update an existing configuration:

featurizer = Bumblebee.configure(featurizer, resize_method: :bilinear)
spec = Bumblebee.configure(spec, embedding_size: 128)

load_generation_config(repository, opts \\ [])

Loads generation config from a model repository.

Generation config includes a number of model-specific properties, so it is usually best to load the config and further configure, rather than building from scratch.

See Bumblebee.Text.GenerationConfig for all the available options.

Options

:spec_module - the model specification module. By default it is inferred from the configuration file, if that is not possible, it must be specified explicitly. Some models have extra options related to generations and those are loaded into a separate struct, stored under the :extra_config attribute

Examples

{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})

generation_config = Bumblebee.configure(generation_config, max_new_tokens: 10)