图机器学习泛化性：回顾与展望

张子威; 王啸; 崔鹏; 朱文武

doi:10.11991/cccf.202601006

摘要: 图机器学习作为分析和处理关联数据的核心工具，已受到学术界和工业界的广泛关注。随着研究的深入与应用场景的复杂化，泛化性成为制约进一步发展图基础模型和图机器学习实际落地部署的关键瓶颈。传统机器学习理论中的独立同分布假设在图数据中无法满足，使得模型在面临分布偏移、多样任务和不同领域时性能无法保证。本文旨在系统性地回顾与展望图机器学习泛化性的研究进展，将其解构为三维泛化性分类体系：1）图数据泛化性，探讨模型从训练数据到未见图结构的泛化能力，特别是应对分布偏移的图分布外泛化方法；2）图任务泛化性，关注模型处理不同图任务的能力，重点阐述了图预训练和微调以及图提示学习范式；3）图领域泛化性，作为进一步挑战，旨在实现模型对不同领域图数据的泛化，总结了基于大语言模型的方法和基于特征和结构对齐的方法。基于上述总结，最后展望了该领域未来潜在的研究方向，包括理论基础、评估体系、多模态与知识融合、结构化数据通用模型、边界问题等。本研究希望通过总结图机器学习泛化性的研究与展望，推动图机器学习的进一步发展。

Abstract: Graph machine learning has become a core tool for analyzing and processing relational data, receiving extensive attention from both academia and industry. With the deepening of research and the increasing complexity of application scenarios, generalization has emerged as the critical bottleneck constraining the further development of graph foundation models and the practical deployment of graph machine learning. In graph data, the independent and identically distributed (I.I.D.) assumption, a cornerstone of traditional machine learning theory, fails to hold, resulting in unreliable model performance under distribution shifts, diverse tasks, and cross-domain scenarios. This work aims to provide a systematic review and forward-looking perspective on recent advances in the generalization of graph machine learning. We decompose the problem into a three-dimensional taxonomy of generalization: 1) Generalization of Graph Data: Investigating the model’s ability to generalize to unseen graph structures, particularly graph out-of-distribution (OOD) generalization under distribution shifts; 2) Generalization of Graph Task: Focusing on the model’s capacity to handle diverse graph tasks, elaborating on the graph pre-training and fine-tuning paradigm as well as the graph prompt learning paradigm; 3) Generalization of Graph Domain: Addressing a further challenge to enable models to generalize across different graph domains, summarizing methods based on large language models (LLMs) and those leveraging feature and structural alignment strategies. Building upon these discussions, this work concludes with research directions in the field, including theoretical foundations, evaluation frameworks, multimodal and knowledge integration, generalist models for structured-data, and the boundary of graph foundation models. By summarizing and prospectively analyzing the generalization problem of graph machine learning, this work aims to advance the theoretical understanding and practical development of graph machine learning.

图机器学习泛化性：回顾与展望

Generalization of Graph Machine Learning: A Review and Perspective