View Source Bumblebee.Text.Generation behaviour (Bumblebee v0.1.2)

An interface for language models supporting sequence generation.

Link to this section Summary

Callbacks

Initializes an opaque cache input for iterative inference.

Functions

Builds a numerical definition that generates sequences of tokens using the given language model.

Initializes an opaque cache input for iterative inference.

Link to this section Callbacks

Link to this callback

init_cache(spec, batch_size, max_length, inputs)

View Source
@callback init_cache(
  spec :: Bumblebee.ModelSpec.t(),
  batch_size :: pos_integer(),
  max_length :: pos_integer(),
  inputs :: map()
) :: Nx.Tensor.t() | Nx.Container.t()

Initializes an opaque cache input for iterative inference.

Link to this section Functions

Link to this function

build_generate(model, spec, opts \\ [])

View Source
@spec build_generate(Axon.t(), Bumblebee.ModelSpec.t(), keyword()) ::
  (params :: map(), inputs :: map() -> Nx.t())

Builds a numerical definition that generates sequences of tokens using the given language model.

The model should be either a decoder or an encoder-decoder. The tokens are generated by iterative inference using the decoder (autoregression), until the termination criteria are met.

In case of encoder-decoder models, the corresponding encoder is run only once and the intermediate state is reused during all iterations.

The length of the generated sequence is not fixed, however it can be controlled via several options.

Note that either :max_new_tokens or :max_length must be specified.

options

Options

  • :max_new_tokens - the maximum number of tokens to be generated, ignoring the number of tokens in the prompt

  • :min_new_tokens - the minimum number of tokens to be generated, ignoring the number of tokens in the prompt

  • :max_length - the maximum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :max_new_tokens, which ignores the number of tokens in the prompt

  • :min_length - the minimum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :min_new_tokens, which ignores the number of tokens in the prompt

  • :decoder_start_token_id - the id of the initial token when generating from scratch, in case of encoder-decoder models

  • :bos_token_id - the id of the beginning-of-sequence token

  • :eos_token_id - the id of the end-of-sequence token

  • :pad_token_id - the id of the padding token

  • :forced_bos_token_id - the id of the token to force as the first generated token

  • :forced_eos_token_id - the id of the token to force as the last generated token when :max_length is reached

The default token option values are taken from the given model specification when available.

Link to this function

init_cache(spec, batch_size, max_length, inputs)

View Source
@spec init_cache(Bumblebee.ModelSpec.t(), pos_integer(), pos_integer(), map()) ::
  Nx.t()

Initializes an opaque cache input for iterative inference.