View Source Bumblebee.Text.ClipText (Bumblebee v0.2.0)
The CLIP model for text encoding.
:base- the base text model
Indices of input sequence tokens in the vocabulary.
Mask indicating which tokens to attend to. This is used to ignore padding tokens, which are added when processing a batch of sequences with different length.
Indices of positions of each input sequence tokens in the position embeddings.
:vocab_size- the vocabulary size of the token embedding. This corresponds to the number of distinct tokens that can be represented in model input and output . Defaults to
:max_positions- the vocabulary size of the position embedding. This corresponds to the maximum sequence length that this model can process. Typically this is set to a large value just in case, such as 512, 1024 or 2048 . Defaults to
:hidden_size- the dimensionality of hidden layers. Defaults to
:num_blocks- the number of Transformer blocks in the encoder. Defaults to
:num_attention_heads- the number of attention heads for each attention layer in the encoder. Defaults to
:intermediate_size- the dimensionality of the intermediate layer in the transformer feed-forward network (FFN) in the encoder. Defaults to
:activation- the activation function. Defaults to
:attention_dropout_rate- the dropout rate for attention weights. Defaults to
:layer_norm_epsilon- the epsilon used by the layer normalization layers. Defaults to
:output_hidden_states- whether the model should return all hidden states. Defaults to
:output_attentions- whether the model should return all attentions. Defaults to
:num_labels- the number of labels to use in the last layer for the classification task. Defaults to
:id_to_label- a map from class index to label. Defaults to