In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era

Li Jiang; Fangxin Liu

doi:10.11991/cccf.202510006

Li Jiang, Fangxin Liu. In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era[J]. Computing Magazine of the CCF, 2025, 1(6): 37−47. DOI: 10.11991/cccf.202510006

Citation:

Li Jiang, Fangxin Liu. In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era[J]. Computing Magazine of the CCF, 2025, 1(6): 37−47. DOI: 10.11991/cccf.202510006

Citation:

Li Jiang, Fangxin Liu. In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era[J]. Computing Magazine of the CCF, 2025, 1(6): 37−47. DOI: 10.11991/cccf.202510006

In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Against the backdrop of societal intelligence driven by artificial intelligence (AI), the Internet of Things (IoT), and 5G technology, Large Language Models (LLMs) with tens of billions or trillions of parameters—represented by GPT-4 and DeepSeek—have become core application engines due to their high computational density. However, in data-intensive tasks, the traditional von Neumann architecture faces the “memory wall” issue caused by frequent data movement, which limits the improvement of system performance. To solve this problem, processing-in-memory (PIM) technology has emerged. Its core is to integrate computing units near or inside memory, so as to avoid the bottleneck of the traditional “compute-store-move” process and reduce the latency and energy consumption caused by data movement. The development of PIM has two stages: in the early era of small models, the focus was on exploring on-chip computing with emerging non-volatile memory (NVM) and static random-access memory (SRAM); in the era of LLMs, it has shifted to PIM technology centered on high-density and low-cost dynamic random accessmemory (DRAM). Typical examples include Samsung's HBM-PIM and SK Hynix's AiM. These technologies release large internal bandwidth and greatly reduce data movement costs by deeply integrating computing units into DRAM arrays. Currently, PIM architectures have two main types: one is the asynchronous-driven mode based on memory-mapped I/O (MMIO). This mode supports independent programming of PIM cores and has high flexibility, but its software stack is complex. The other is the synchronous-driven mode based on central processing unit (CPU) instruction set extension. It has better ecosystem compatibility, but faces challenges such as insufficient instruction issuance bandwidth and bus competition. Although PIM has great potential in energy efficiency and parallelism, its large-scale commercial use still needs to overcome a series of key system-level issues. These include compatibility with CPUs, communication bottlenecks, data layout adaptation, software-hardware ecosystem construction, algorithm collaboration, and heat control. In summary, PIM is a strategic direction to break through the computing power bottleneck of LLMs and drive the innovation of computing models. Its successful application relies on interdisciplinary cooperation, and it will play a key role in the computing field in the future.

FullText(HTML)

References (9)

Cited By

Turn off MathJax

Article Contents

In-Memory Computing: The Solution to Break the Computility Bottleneck in the LLM Era

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content