Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking

Chaojun Xiao; Yewei Fang; Xu Han

doi:10.11991/cccf.202511008

Chaojun Xiao, Yewei Fang, Xu Han. Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking[J]. Computing Magazine of the CCF, 2025, 1(7): 42−47. DOI: 10.11991/cccf.202511008

Citation:

Chaojun Xiao, Yewei Fang, Xu Han. Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking[J]. Computing Magazine of the CCF, 2025, 1(7): 42−47. DOI: 10.11991/cccf.202511008

Citation:

Chaojun Xiao, Yewei Fang, Xu Han. Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking[J]. Computing Magazine of the CCF, 2025, 1(7): 42−47. DOI: 10.11991/cccf.202511008

Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Large language models demonstrate strong deep reasoning capabilities by generating chain-of-thoughts for complex problems. However, the quadratic complexity of self-attention mechanisms creates substantial computational and memory overhead when processing long sequences with numerous reasoning tokens, limiting the efficiency of deep reasoning models during both training and inference. While existing work focuses on post-processing optimizations for inference efficiency, training-stage efficiency remains largely unaddressed. We observe that reasoning processes exhibit locality, which makes hybrid attention mechanisms particularly suitable. We convert full attention models to hybrid attention models via minimal post-training and perform deep reasoning training on this architecture. On benchmarks including AIME, MATH-500, and LiveCodebench, our 1:1 hybrid attention model achieves comparable or superior performance to full attention models while reducing training time by 22% and key-value cache storage by 46.9% under 64k context windows.

FullText(HTML)

References (23)

Cited By

Turn off MathJax

Article Contents

Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content