Abstract:
Generative visual media, encompassing images, videos, 3D geometry, as well as virtual reality (VR) and augmented reality (AR), is becoming a transformative force in the digital economy. Driven by diffusion models, Transformer architectures, and large-scale multimodal pretraining, it is reshaping traditional content creation paradigms and enabling more realistic, controllable, and high-fidelity generation. This report summarizes the key insights of the 23rd CCF Beautiful Lake Seminar, which brought together experts nationwide to discuss the theoretical foundations, computational architectures, and interdisciplinary applications of generative visual media. The discussions focused on major challenges such as geometry-physics integrated representation, native 3D feature extraction, multimodal alignment and control, CAD/CAE system integration, and multisensory VR content generation. The symposium identified the core bottlenecks, challenges, and development pathways of generative visual media and reached a consensus on future actions. This report distills twelve key scientific and technological questions, outlines five categories of application scenarios, and highlights the role of generative visual media in advancing China’s technological breakthroughs and industrial deployment, providing strategic guidance for its application in intelligent manufacturing, cultural industries, and national security.