Mixture of Agents: N proposer models feed into an aggregator.
Implements a multi-agent architecture where N independent "proposer" transformer stacks process the same input in parallel, then their outputs are concatenated and fed into a larger "aggregator" transformer stack that combines the proposals.
Architecture
Input [batch, seq, embed_dim]
|
+----+----+----+----+
| | | | |
v v v v v
P1 P2 P3 P4 ... (Proposer stacks)
| | | | |
v v v v v
Concatenate along feature dim
|
v
[batch, seq, num_proposers * proposer_hidden]
|
v
Dense projection to aggregator_hidden
|
v
+-----------------------------+
| Aggregator Transformer |
| (larger, combines all) |
+-----------------------------+
|
v
Final Norm -> Last Timestep
Output [batch, aggregator_hidden_size]Design
Each proposer is a lightweight transformer stack that can specialize on different aspects of the input. The aggregator is typically larger and learns to combine the diverse proposals into a unified representation.
Usage
model = MixtureOfAgents.build(
embed_dim: 287,
num_proposers: 4,
proposer_hidden_size: 128,
aggregator_hidden_size: 256,
proposer_layers: 2,
aggregator_layers: 2
)References
- Wang et al., "Mixture-of-Agents Enhances Large Language Model Capabilities" (2024)
Summary
Functions
Build a Mixture of Agents model.
Get the output size of a MixtureOfAgents model.
Get recommended defaults for MixtureOfAgents.
Types
@type build_opt() :: {:aggregator_hidden_size, pos_integer()} | {:aggregator_layers, pos_integer()} | {:dropout, float()} | {:embed_dim, pos_integer()} | {:num_heads, pos_integer()} | {:num_proposers, pos_integer()} | {:proposer_hidden_size, pos_integer()} | {:proposer_layers, pos_integer()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build a Mixture of Agents model.
Options
:embed_dim- Input embedding dimension (required):num_proposers- Number of proposer stacks (default: 4):proposer_hidden_size- Hidden size for each proposer (default: 128):aggregator_hidden_size- Hidden size for the aggregator (default: 256):proposer_layers- Number of layers per proposer (default: 2):aggregator_layers- Number of aggregator layers (default: 2):num_heads- Number of attention heads (default: 4):dropout- Dropout rate (default: 0.1):window_size- Sequence length (default: 60)
Returns
An Axon model outputting [batch, aggregator_hidden_size].
@spec output_size(keyword()) :: pos_integer()
Get the output size of a MixtureOfAgents model.
@spec recommended_defaults() :: keyword()
Get recommended defaults for MixtureOfAgents.