A compact large language models with collaboration of experts

ZHOU Hongyi; HUANG Shaomang; PAN Jianfeng; PENG Min; ZHENG Hanzhong

doi:10.11991/cccf.202506007

ZHOU Hongyi, HUANG Shaomang, PAN Jianfeng, et al. A compact large language models with collaboration of experts[J]. Computing Magazine of the CCF, 2025, 1(2): 40−48. DOI: 10.11991/cccf.202506007

Citation:

ZHOU Hongyi, HUANG Shaomang, PAN Jianfeng, et al. A compact large language models with collaboration of experts[J]. Computing Magazine of the CCF, 2025, 1(2): 40−48. DOI: 10.11991/cccf.202506007

Citation:

ZHOU Hongyi, HUANG Shaomang, PAN Jianfeng, et al. A compact large language models with collaboration of experts[J]. Computing Magazine of the CCF, 2025, 1(2): 40−48. DOI: 10.11991/cccf.202506007

A compact large language models with collaboration of experts

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Large language models (LLMs) have achieved remarkable success in natural language processing and other domains. However, limited GPU memory in industrial settings leads to a trade-off between resource efficiency and model performance when extending LLMs to multiple downstream tasks, thereby limiting their broader deployment. To address this, we propose Compact LLM with Collaboration of Experts (CCoE), a modular and compact multi-expert architecture. CCoE integrates multiple domain-specific experts into a unified LLM with efficiency and flexibility, significantly reducing memory overhead. It further employs a rule-based gating mechanism and an expert planning module to enable precise task assignment and effective expert collaboration, thereby supporting complex reasoning. Experiments across five distinct datasets show that CCoE matches the performance of domain-specific LLMs. Compared to existing model ensemble methods, CCoE reduces GPU memory usage by 61.3%, and improves inference throughput by 76.4% compared to parameter-efficient multi-expert integration approaches.

FullText(HTML)

References (25)

Cited By

Turn off MathJax

Article Contents

A compact large language models with collaboration of experts

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content