Jailbreak Attack and Defense of Large Language Models

Kui Ren; Zhibo Wang; Zhan Qin; Rui Zheng; Huiyu Xu

doi:10.11991/cccf.202511010

Kui Ren, Zhibo Wang, Zhan Qin, et al. Jailbreak Attack and Defense of Large Language Models[J]. Computing Magazine of the CCF, 2025, 1(7): 55−61. DOI: 10.11991/cccf.202511010

Citation:

Kui Ren, Zhibo Wang, Zhan Qin, et al. Jailbreak Attack and Defense of Large Language Models[J]. Computing Magazine of the CCF, 2025, 1(7): 55−61. DOI: 10.11991/cccf.202511010

Citation:

Kui Ren, Zhibo Wang, Zhan Qin, et al. Jailbreak Attack and Defense of Large Language Models[J]. Computing Magazine of the CCF, 2025, 1(7): 55−61. DOI: 10.11991/cccf.202511010

Jailbreak Attack and Defense of Large Language Models

Graphical Abstract

Graphical Abstract

Abstract

Abstract

In recent years, large language models (LLMs) represented by ChatGPT and Deepseek-R1 have triggered successive waves of artificial intelligence (AI) development, accelerating AI's penetration into traditional domains. However, due to diverse input content and broad user base, LLMs face significant security risks. Among these, jailbreak attacks represent one of the most critical threats, potentially inducing models to generate harmful content, leading to malicious exploitation and regulatory violations of LLM service providers. This article analyzes the security risks of jailbreak attacks on LLMs, reviews current defense methods, and examines both the challenges and potential solutions in this domain.

FullText(HTML)

References (20)

Cited By

Turn off MathJax

Article Contents

Jailbreak Attack and Defense of Large Language Models

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content