Abstract:
As artificial intelligence evolves from information processing to physical interaction, embodied intelligence has emerged as a critical paradigm for enabling agents to perceive, reason, and act in unstructured open environments. However, bridging the “Sim-to-Real Gap” remains a significant challenge, with agents facing difficulties in spatiotemporal generalization, long-horizon task consistency, and robust decision-making under physical constraints. Based on the insights from the CCF YOCSEF Xi’an forum, this article systematically reviews the core scientific problems of embodied intelligence in open scenarios. It elaborates on the mechanism of spatiotemporal cognition, including affective state regulation for temporal continuity and multimodal semantic fusion for spatial perception. Furthermore, it discusses the evolution of system architectures towards a hierarchical decision-making framework and World Models, while outlining the key technical challenges and an ecological path for industrial application. The article provides theoretical perspectives and practical guidance for the realization of general-purpose robots in the real world.