Advanced Search
Cong Li, Yihan Yin, Guangyu Sun. H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference[J]. Computing Magazine of the CCF, 2025, 1(7): 48−54, 101. DOI: 10.11991/cccf.202511009
Citation: Cong Li, Yihan Yin, Guangyu Sun. H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference[J]. Computing Magazine of the CCF, 2025, 1(7): 48−54, 101. DOI: 10.11991/cccf.202511009

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

  • To address the demand for low-batch large language model (LLM) inference at the edge, existing near-memory processing architectures are constrained by limited computational capability, resulting in suboptimal acceleration performance. To overcome this, we propose H2-LLM, a heterogeneous accelerator based on hybrid bonding. H2-LLM employs an innovative heterogeneous architecture design to enabled through hybrid bonding to balance computational capability and bandwidth. Furthermore, it introduces a data-centric dataflow abstraction methodology to fully exploit the speedup potential of low-batch inference. Utilizing a design space exploration (DSE) framework, we automatically optimize the architectural configuration. Compared to existing in-die near-memory processing architectures and dataflow implementations, H2-LLM achieves significant improvements in both performance and energy efficiency.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return