-
Graphical Abstract
-
Abstract
A mixture of experts (MoE) model is a type of neural network architecture that introduces a routing network and multiple expert sub-networks to replace the traditional dense network. During inference, the MoE architecture uses the routing network to select which expert subnetworks to activate for each input, enabling only a subset of experts to process the task. Because of this sparse activation mechanism, MoE models significantly reduce the computational cost of training and inference compared to dense models with similar performance, making it possible to scale up model size under a given compute budget.
-
-