Bottleneck Adapter modules for parameter-efficient finetuning.
Adapter layers are small bottleneck modules inserted between frozen pretrained layers. Each adapter consists of a down-projection, nonlinearity, and up-projection with a residual connection, adding only a small number of trainable parameters.
Architecture
Input x [batch, hidden_size]
|
+---> Down-project to bottleneck [batch, bottleneck_size]
| |
| v
| Activation (ReLU)
| |
| v
| Up-project [batch, hidden_size]
| |
v v
x + adapter_output
|
v
Output [batch, hidden_size]Usage
# Standalone adapter
adapter = Adapter.build(hidden_size: 768, bottleneck_size: 64)
# Wrap an existing layer with an adapter
original_output = Axon.dense(input, 768, name: "pretrained_layer")
adapted = Adapter.wrap(original_output, hidden_size: 768, bottleneck_size: 64)References
- Houlsby et al., "Parameter-Efficient Transfer Learning for NLP" (ICML 2019)
- https://arxiv.org/abs/1902.00751
Summary
Functions
Build the adapter bottleneck: down-project -> activate -> up-project -> residual add.
Build a standalone bottleneck adapter.
Get the output size of an adapter (same as input).
Wrap an existing layer output with an adapter (residual bottleneck).
Types
@type build_opt() :: {:activation, atom()} | {:bottleneck_size, pos_integer()} | {:hidden_size, pos_integer()}
Options for build/1.
Functions
@spec adapter_block(Axon.t(), pos_integer(), keyword()) :: Axon.t()
Build the adapter bottleneck: down-project -> activate -> up-project -> residual add.
Parameters
input- Axon input nodehidden_size- Input/output dimension
Options
:bottleneck_size- Bottleneck dimension (default: 64):activation- Activation function (default: :relu):name- Layer name prefix
Build a standalone bottleneck adapter.
Options
:hidden_size- Input/output dimension (required):bottleneck_size- Bottleneck dimension (default: 64):activation- Activation function (default: :relu):name- Layer name prefix (default: "adapter")
Returns
An Axon model: [batch, hidden_size] -> [batch, hidden_size]
@spec output_size(keyword()) :: pos_integer()
Get the output size of an adapter (same as input).
Wrap an existing layer output with an adapter (residual bottleneck).
Inserts the adapter after the given layer with a residual connection:
output = layer_output + adapter(layer_output)Parameters
layer_output- Axon node from the existing (frozen) layer
Options
:hidden_size- Hidden dimension matching the layer output (required):bottleneck_size- Bottleneck dimension (default: 64):activation- Activation function (default: :relu):name- Layer name prefix (default: "adapter")
Returns
An Axon node with the adapted output.