# View Source Bumblebee.Text.Generationbehaviour(Bumblebee v0.2.0)

An interface for language models supporting sequence generation.

# Link to this section Summary

## Callbacks

Initializes an opaque cache input for iterative inference.

## Functions

Builds a numerical definition that generates sequences of tokens using the given language model.

Initializes an opaque cache input for iterative inference.

# init_cache(spec, batch_size, max_length, inputs)

View Source
@callback init_cache(
spec :: Bumblebee.ModelSpec.t(),
batch_size :: pos_integer(),
max_length :: pos_integer(),
inputs :: map()
) :: Nx.Tensor.t() | Nx.Container.t()

Initializes an opaque cache input for iterative inference.

# build_generate(model, spec, opts \\ [])

View Source
@spec build_generate(Axon.t(), Bumblebee.ModelSpec.t(), keyword()) ::
(params :: map(), inputs :: map() -> Nx.t())

Builds a numerical definition that generates sequences of tokens using the given language model.

The model should be either a decoder or an encoder-decoder. The tokens are generated by iterative inference using the decoder (autoregression), until the termination criteria are met.

In case of encoder-decoder models, the corresponding encoder is run only once and the intermediate state is reused during all iterations.

The length of the generated sequence is not fixed, however it can be controlled via several options.

Note that either :max_new_tokens or :max_length must be specified.

## options Options

• :max_new_tokens - the maximum number of tokens to be generated, ignoring the number of tokens in the prompt

• :min_new_tokens - the minimum number of tokens to be generated, ignoring the number of tokens in the prompt

• :max_length - the maximum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :max_new_tokens, which ignores the number of tokens in the prompt

• :min_length - the minimum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer :min_new_tokens, which ignores the number of tokens in the prompt

• :decoder_start_token_id - the id of the initial token when generating from scratch, in case of encoder-decoder models

• :bos_token_id - the id of the beginning-of-sequence token

• :eos_token_id - the id of the end-of-sequence token

• :pad_token_id - the id of the padding token

• :forced_bos_token_id - the id of the token to force as the first generated token

• :forced_eos_token_id - the id of the token to force as the last generated token when :max_length is reached

The default token option values are taken from the given model specification when available.

@spec init_cache(Bumblebee.ModelSpec.t(), pos_integer(), pos_integer(), map()) ::
Nx.t()