Link to this section Summary

Callbacks

init_cache(spec, batch_size, max_length, inputs)

Initializes an opaque cache input for iterative inference.

Functions

build_generate(model, spec, opts \\ [])

Builds a numerical definition that generates sequences of tokens using the given language model.

init_cache(spec, batch_size, max_length, inputs)

Initializes an opaque cache input for iterative inference.

Link to this section Callbacks

init_cache(spec, batch_size, max_length, inputs)

@callback init_cache(
  spec :: Bumblebee.ModelSpec.t(),
  batch_size :: pos_integer(),
  max_length :: pos_integer(),
  inputs :: map()
) :: Nx.Tensor.t() | Nx.Container.t()

Initializes an opaque cache input for iterative inference.

Link to this section Functions

build_generate(model, spec, opts \\ [])

@spec build_generate(Axon.t(), Bumblebee.ModelSpec.t(), keyword()) ::
  (params :: map(), inputs :: map() -> Nx.t())

Builds a numerical definition that generates sequences of tokens using the given language model.

The model should be either a decoder or an encoder-decoder. The tokens are generated by iterative inference using the decoder (autoregression), until the termination criteria are met.

In case of encoder-decoder models, the corresponding encoder is run only once and the intermediate state is reused during all iterations.

The length of the generated sequence is not fixed, however it can be controlled via several options.

Note that either :max_new_tokens or :max_length must be specified.

options
Options

:max_new_tokens - the maximum number of tokens to be generated, ignoring the number of tokens in the prompt
:min_new_tokens - the minimum number of tokens to be generated, ignoring the number of tokens in the prompt
:max_length - the maximum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :max_new_tokens, which ignores the number of tokens in the prompt
:min_length - the minimum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :min_new_tokens, which ignores the number of tokens in the prompt
:decoder_start_token_id - the id of the initial token when generating from scratch, in case of encoder-decoder models
:bos_token_id - the id of the beginning-of-sequence token
:eos_token_id - the id of the end-of-sequence token
:pad_token_id - the id of the padding token
:forced_bos_token_id - the id of the token to force as the first generated token
:forced_eos_token_id - the id of the token to force as the last generated token when :max_length is reached

The default token option values are taken from the given model specification when available.

init_cache(spec, batch_size, max_length, inputs)

@spec init_cache(Bumblebee.ModelSpec.t(), pos_integer(), pos_integer(), map()) ::
  Nx.t()

Initializes an opaque cache input for iterative inference.

Settings View Source Bumblebee.Text.Generation behaviour (Bumblebee v0.1.2)

Link to this section Summary

Callbacks

Functions

Link to this section Callbacks

init_cache(spec, batch_size, max_length, inputs)

Link to this section Functions

build_generate(model, spec, opts \\ [])

options Options

init_cache(spec, batch_size, max_length, inputs)

View Source Bumblebee.Text.Generation behaviour (Bumblebee v0.1.2)

options
Options