H<sup>2</sup>-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Cong Li; Yihan Yin; Guangyu Sun

doi:10.11991/cccf.202511009

Cong Li, Yihan Yin, Guangyu Sun. H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference[J]. Computing Magazine of the CCF, 2025, 1(7): 48−54, 101. DOI: 10.11991/cccf.202511009

Citation:

H²-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Graphical Abstract

Graphical Abstract

Abstract

Abstract

To address the demand for low-batch large language model (LLM) inference at the edge, existing near-memory processing architectures are constrained by limited computational capability, resulting in suboptimal acceleration performance. To overcome this, we propose H²-LLM, a heterogeneous accelerator based on hybrid bonding. H²-LLM employs an innovative heterogeneous architecture design to enabled through hybrid bonding to balance computational capability and bandwidth. Furthermore, it introduces a data-centric dataflow abstraction methodology to fully exploit the speedup potential of low-batch inference. Utilizing a design space exploration (DSE) framework, we automatically optimize the architectural configuration. Compared to existing in-die near-memory processing architectures and dataflow implementations, H²-LLM achieves significant improvements in both performance and energy efficiency.

FullText(HTML)

References (7)

Cited By

Turn off MathJax

Article Contents

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content

H²-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference