Edifice.Meta.Adapter (Edifice v0.2.0)

Bottleneck Adapter modules for parameter-efficient finetuning.

Adapter layers are small bottleneck modules inserted between frozen pretrained layers. Each adapter consists of a down-projection, nonlinearity, and up-projection with a residual connection, adding only a small number of trainable parameters.

Architecture

Input x [batch, hidden_size]
      |
      +---> Down-project to bottleneck [batch, bottleneck_size]
      |                    |
      |                    v
      |              Activation (ReLU)
      |                    |
      |                    v
      |         Up-project [batch, hidden_size]
      |                    |
      v                    v
      x    +    adapter_output
      |
      v
Output [batch, hidden_size]

Usage

# Standalone adapter
adapter = Adapter.build(hidden_size: 768, bottleneck_size: 64)

# Wrap an existing layer with an adapter
original_output = Axon.dense(input, 768, name: "pretrained_layer")
adapted = Adapter.wrap(original_output, hidden_size: 768, bottleneck_size: 64)

References

Houlsby et al., "Parameter-Efficient Transfer Learning for NLP" (ICML 2019)
https://arxiv.org/abs/1902.00751

Summary

Types

build_opt()

Options for build/1.

Functions

adapter_block(input, hidden_size, opts \\ [])

Build the adapter bottleneck: down-project -> activate -> up-project -> residual add.

build(opts \\ [])

Build a standalone bottleneck adapter.

output_size(opts \\ [])

Get the output size of an adapter (same as input).

wrap(layer_output, opts \\ [])

Wrap an existing layer output with an adapter (residual bottleneck).

Types

build_opt()

@type build_opt() ::
  {:activation, atom()}
  | {:bottleneck_size, pos_integer()}
  | {:hidden_size, pos_integer()}

Options for build/1.

Functions

adapter_block(input, hidden_size, opts \\ [])

@spec adapter_block(Axon.t(), pos_integer(), keyword()) :: Axon.t()

Build the adapter bottleneck: down-project -> activate -> up-project -> residual add.

Parameters

input - Axon input node
hidden_size - Input/output dimension

Options

:bottleneck_size - Bottleneck dimension (default: 64)
:activation - Activation function (default: :relu)
:name - Layer name prefix

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a standalone bottleneck adapter.

Options

:hidden_size - Input/output dimension (required)
:bottleneck_size - Bottleneck dimension (default: 64)
:activation - Activation function (default: :relu)
:name - Layer name prefix (default: "adapter")

Returns

An Axon model: [batch, hidden_size] -> [batch, hidden_size]

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of an adapter (same as input).

wrap(layer_output, opts \\ [])

@spec wrap(
  Axon.t(),
  keyword()
) :: Axon.t()

Wrap an existing layer output with an adapter (residual bottleneck).

Inserts the adapter after the given layer with a residual connection:

output = layer_output + adapter(layer_output)

Parameters

layer_output - Axon node from the existing (frozen) layer

Options

:hidden_size - Hidden dimension matching the layer output (required)
:bottleneck_size - Bottleneck dimension (default: 64)
:activation - Activation function (default: :relu)
:name - Layer name prefix (default: "adapter")

Returns

An Axon node with the adapted output.