Generalization of Graph Machine Learning: A Review and Perspective
-
Graphical Abstract
-
Abstract
Graph machine learning has become a core tool for analyzing and processing relational data, receiving extensive attention from both academia and industry. With the deepening of research and the increasing complexity of application scenarios, generalization has emerged as the critical bottleneck constraining the further development of graph foundation models and the practical deployment of graph machine learning. In graph data, the independent and identically distributed (I.I.D.) assumption, a cornerstone of traditional machine learning theory, fails to hold, resulting in unreliable model performance under distribution shifts, diverse tasks, and cross-domain scenarios. This work aims to provide a systematic review and forward-looking perspective on recent advances in the generalization of graph machine learning. We decompose the problem into a three-dimensional taxonomy of generalization: 1) Generalization of Graph Data: Investigating the model’s ability to generalize to unseen graph structures, particularly graph out-of-distribution (OOD) generalization under distribution shifts; 2) Generalization of Graph Task: Focusing on the model’s capacity to handle diverse graph tasks, elaborating on the graph pre-training and fine-tuning paradigm as well as the graph prompt learning paradigm; 3) Generalization of Graph Domain: Addressing a further challenge to enable models to generalize across different graph domains, summarizing methods based on large language models (LLMs) and those leveraging feature and structural alignment strategies. Building upon these discussions, this work concludes with research directions in the field, including theoretical foundations, evaluation frameworks, multimodal and knowledge integration, generalist models for structured-data, and the boundary of graph foundation models. By summarizing and prospectively analyzing the generalization problem of graph machine learning, this work aims to advance the theoretical understanding and practical development of graph machine learning.
-
-