Jailbreak Attack and Defense of Large Language Models
-
Graphical Abstract
-
Abstract
In recent years, large language models (LLMs) represented by ChatGPT and Deepseek-R1 have triggered successive waves of artificial intelligence (AI) development, accelerating AI's penetration into traditional domains. However, due to diverse input content and broad user base, LLMs face significant security risks. Among these, jailbreak attacks represent one of the most critical threats, potentially inducing models to generate harmful content, leading to malicious exploitation and regulatory violations of LLM service providers. This article analyzes the security risks of jailbreak attacks on LLMs, reviews current defense methods, and examines both the challenges and potential solutions in this domain.
-
-