Abstract:
This article systematically presents the paradigm shift in artificial intelligence from model-centric AI (MCAI) to data-centric AI (DCAI), and proposes a novel DCAI-oriented data infrastructure framework. The framework comprises an AI database supporting unified multimodal data management and DataFlow - an integrated data preparation and dynamic training platform. This architecture fundamentally overcomes the limitations of conventional data lakes and processing tools, establishing an efficient synergistic mechanism between data and models. Through innovative applications in large-scale model pretraining and enterprise knowledge base construction, we demonstrate the transformative potential of DCAI infrastructure in significantly enhancing model performance while substantially lowering development barriers. Our solution provides a systematic approach to facilitate AI's evolution toward next-generation intelligent computing paradigms.