Edifice.Meta.MixtureOfAgents (Edifice v0.2.0)

Copy Markdown View Source

Mixture of Agents: N proposer models feed into an aggregator.

Implements a multi-agent architecture where N independent "proposer" transformer stacks process the same input in parallel, then their outputs are concatenated and fed into a larger "aggregator" transformer stack that combines the proposals.

Architecture

Input [batch, seq, embed_dim]
      |
      +----+----+----+----+
      |    |    |    |    |
      v    v    v    v    v
     P1   P2   P3   P4  ...  (Proposer stacks)
      |    |    |    |    |
      v    v    v    v    v
Concatenate along feature dim
      |
      v
[batch, seq, num_proposers * proposer_hidden]
      |
      v
Dense projection to aggregator_hidden
      |
      v
+-----------------------------+
|   Aggregator Transformer    |
|   (larger, combines all)    |
+-----------------------------+
      |
      v
Final Norm -> Last Timestep
Output [batch, aggregator_hidden_size]

Design

Each proposer is a lightweight transformer stack that can specialize on different aspects of the input. The aggregator is typically larger and learns to combine the diverse proposals into a unified representation.

Usage

model = MixtureOfAgents.build(
  embed_dim: 287,
  num_proposers: 4,
  proposer_hidden_size: 128,
  aggregator_hidden_size: 256,
  proposer_layers: 2,
  aggregator_layers: 2
)

References

  • Wang et al., "Mixture-of-Agents Enhances Large Language Model Capabilities" (2024)

Summary

Types

Options for build/1.

Functions

Build a Mixture of Agents model.

Get the output size of a MixtureOfAgents model.

Get recommended defaults for MixtureOfAgents.

Types

build_opt()

@type build_opt() ::
  {:aggregator_hidden_size, pos_integer()}
  | {:aggregator_layers, pos_integer()}
  | {:dropout, float()}
  | {:embed_dim, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_proposers, pos_integer()}
  | {:proposer_hidden_size, pos_integer()}
  | {:proposer_layers, pos_integer()}
  | {:window_size, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a Mixture of Agents model.

Options

  • :embed_dim - Input embedding dimension (required)
  • :num_proposers - Number of proposer stacks (default: 4)
  • :proposer_hidden_size - Hidden size for each proposer (default: 128)
  • :aggregator_hidden_size - Hidden size for the aggregator (default: 256)
  • :proposer_layers - Number of layers per proposer (default: 2)
  • :aggregator_layers - Number of aggregator layers (default: 2)
  • :num_heads - Number of attention heads (default: 4)
  • :dropout - Dropout rate (default: 0.1)
  • :window_size - Sequence length (default: 60)

Returns

An Axon model outputting [batch, aggregator_hidden_size].

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of a MixtureOfAgents model.