Mixture of Experts

Xin Zhao, Yiwen Hu, Zhipeng Chen, et al. Mixture of Experts[J]. Computing Magazine of the CCF, 2026, 2(1): 91−96. DOI: 10.11991/cccf.202601014

Citation:

Xin Zhao, Yiwen Hu, Zhipeng Chen, et al. Mixture of Experts[J]. Computing Magazine of the CCF, 2026, 2(1): 91−96. DOI: 10.11991/cccf.202601014

Citation:

Xin Zhao, Yiwen Hu, Zhipeng Chen, et al. Mixture of Experts[J]. Computing Magazine of the CCF, 2026, 2(1): 91−96. DOI: 10.11991/cccf.202601014

Abstract

A mixture of experts (MoE) model is a type of neural network architecture that introduces a routing network and multiple expert sub-networks to replace the traditional dense network. During inference, the MoE architecture uses the routing network to select which expert subnetworks to activate for each input, enabling only a subset of experts to process the task. Because of this sparse activation mechanism, MoE models significantly reduce the computational cost of training and inference compared to dense models with similar performance, making it possible to scale up model size under a given compute budget.

FullText(HTML)

Turn off MathJax

Article Contents

Export File