Abstract:
With the continuous evolution of cyberattack techniques, automated penetration testing—an essential approach for assessing system vulnerabilities—faces significant challenges, including dynamic network environments, sparse feedback signals, complex multi-stage attack planning, and adaptive defense mechanisms. In recent years, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, contextual reasoning, and multi-step task planning, offering new opportunities for building intelligent penetration testing systems. This paper systematically analyzes four major challenges in automated penetration testing and reviews representative LLM-powered solutions across four key aspects: dynamic environment modeling, strategy optimization under sparse rewards, causal multi-stage path reasoning, and adaptive planning against evolving defenses. The findings show that LLMs exhibit promising context-awareness, causal inference, and behavioral adaptability, significantly enhancing the intelligence and robustness of automated testing frameworks. Finally, this paper outlines future directions for LLM-enabled penetration testing, including multi-modal integration, goal-driven attack reasoning, adaptive security evaluation, and trustworthy system design, providing theoretical guidance and technical reference for the next generation of intelligent red teaming systems.