Generative Visual Media in the Era of Foundation Models: Opportunities and Challenges—Insights from the 23rd CCF Beautiful Lake Seminar
-
Xin Yang,
-
Beibei Wang,
-
Jie Guo,
-
Jiazhi Xia,
-
Lili Wang,
-
Lin Lyu,
-
Libin Liu,
-
Xuejin Chen,
-
Lin Gao,
-
Ran He,
-
Kai Xu,
-
Kun Zhou
-
Graphical Abstract
-
Abstract
Generative visual media, encompassing images, videos, 3D geometry, as well as virtual reality (VR) and augmented reality (AR), is becoming a transformative force in the digital economy. Driven by diffusion models, Transformer architectures, and large-scale multimodal pretraining, it is reshaping traditional content creation paradigms and enabling more realistic, controllable, and high-fidelity generation. This report summarizes the key insights of the 23rd CCF Beautiful Lake Seminar, which brought together experts nationwide to discuss the theoretical foundations, computational architectures, and interdisciplinary applications of generative visual media. The discussions focused on major challenges such as geometry-physics integrated representation, native 3D feature extraction, multimodal alignment and control, CAD/CAE system integration, and multisensory VR content generation. The symposium identified the core bottlenecks, challenges, and development pathways of generative visual media and reached a consensus on future actions. This report distills twelve key scientific and technological questions, outlines five categories of application scenarios, and highlights the role of generative visual media in advancing China’s technological breakthroughs and industrial deployment, providing strategic guidance for its application in intelligent manufacturing, cultural industries, and national security.
-
-