高级检索

混合专家模型

Mixture of Experts

  • 摘要: 混合专家模型(mixture of experts, MoE)是一种神经网络模型架构,其特点是在模型中引入路由网络与专家子网络,进而代替原始的稠密网络。在推理过程中,MoE架构通过路由网络选择每次需要激活的专家子网络,仅激活其中部分专家完成给定任务。由于采用稀疏激活机制,混合专家模型同与其性能相当的稠密模型相比,大幅减少了训练和推理过程的计算开销,使得在给定计算成本下扩展模型规模成为可能。

     

    Abstract: A mixture of experts (MoE) model is a type of neural network architecture that introduces a routing network and multiple expert sub-networks to replace the traditional dense network. During inference, the MoE architecture uses the routing network to select which expert subnetworks to activate for each input, enabling only a subset of experts to process the task. Because of this sparse activation mechanism, MoE models significantly reduce the computational cost of training and inference compared to dense models with similar performance, making it possible to scale up model size under a given compute budget.

     

/

返回文章
返回