View Source Bumblebee.Text.Generation behaviour (Bumblebee v0.1.2)
An interface for language models supporting sequence generation.
Link to this section Summary
Callbacks
Initializes an opaque cache input for iterative inference.
Functions
Builds a numerical definition that generates sequences of tokens using the given language model.
Initializes an opaque cache input for iterative inference.
Link to this section Callbacks
@callback init_cache( spec :: Bumblebee.ModelSpec.t(), batch_size :: pos_integer(), max_length :: pos_integer(), inputs :: map() ) :: Nx.Tensor.t() | Nx.Container.t()
Initializes an opaque cache input for iterative inference.
Link to this section Functions
@spec build_generate(Axon.t(), Bumblebee.ModelSpec.t(), keyword()) :: (params :: map(), inputs :: map() -> Nx.t())
Builds a numerical definition that generates sequences of tokens using the given language model.
The model should be either a decoder or an encoder-decoder. The tokens are generated by iterative inference using the decoder (autoregression), until the termination criteria are met.
In case of encoder-decoder models, the corresponding encoder is run only once and the intermediate state is reused during all iterations.
The length of the generated sequence is not fixed, however it can be controlled via several options.
Note that either :max_new_tokens
or :max_length
must be specified.
options
Options
:max_new_tokens
- the maximum number of tokens to be generated, ignoring the number of tokens in the prompt:min_new_tokens
- the minimum number of tokens to be generated, ignoring the number of tokens in the prompt:max_length
- the maximum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer:max_new_tokens
, which ignores the number of tokens in the prompt:min_length
- the minimum length of the sequence to be generated. Note that this length includes the length of the input prompt (including padding). In general, prefer:min_new_tokens
, which ignores the number of tokens in the prompt:decoder_start_token_id
- the id of the initial token when generating from scratch, in case of encoder-decoder models:bos_token_id
- the id of the beginning-of-sequence token:eos_token_id
- the id of the end-of-sequence token:pad_token_id
- the id of the padding token:forced_bos_token_id
- the id of the token to force as the first generated token:forced_eos_token_id
- the id of the token to force as the last generated token when:max_length
is reached
The default token option values are taken from the given model specification when available.
@spec init_cache(Bumblebee.ModelSpec.t(), pos_integer(), pos_integer(), map()) :: Nx.t()
Initializes an opaque cache input for iterative inference.