Advanced Search
Chaojun Xiao, Yewei Fang, Xu Han. Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking[J]. Computing Magazine of the CCF, 2025, 1(7): 42−47. DOI: 10.11991/cccf.202511008
Citation: Chaojun Xiao, Yewei Fang, Xu Han. Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking[J]. Computing Magazine of the CCF, 2025, 1(7): 42−47. DOI: 10.11991/cccf.202511008

Hybrid Attention Mechanism is Efficient and Effective for Deep Thinking

  • Large language models demonstrate strong deep reasoning capabilities by generating chain-of-thoughts for complex problems. However, the quadratic complexity of self-attention mechanisms creates substantial computational and memory overhead when processing long sequences with numerous reasoning tokens, limiting the efficiency of deep reasoning models during both training and inference. While existing work focuses on post-processing optimizations for inference efficiency, training-stage efficiency remains largely unaddressed. We observe that reasoning processes exhibit locality, which makes hybrid attention mechanisms particularly suitable. We convert full attention models to hybrid attention models via minimal post-training and perform deep reasoning training on this architecture. On benchmarks including AIME, MATH-500, and LiveCodebench, our 1:1 hybrid attention model achieves comparable or superior performance to full attention models while reducing training time by 22% and key-value cache storage by 46.9% under 64k context windows.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return