Abstract:
Synthesis and retrosynthesis represent typical problems in complex scientific computing. They involve tightly coupled multimodal information, high-dimensional combinatorial search spaces, and strong domain-specific constraints. In recent years, artificial intelligence techniques have made notable progress in related tasks. However, many existing approaches still face limitations in modeling reaction processes, supporting collaborative computation across distributed data sources, and providing interpretable results. These issues restrict their applicability in practical scientific and engineering settings.This work studies the computational modeling of synthesis and retrosynthesis, with particular attention to how key structural transformations in reaction processes can be represented and analyzed. The study is organized around three aspects: data representation, model design, and system implementation, with a focus on reaction-level modeling and its scalability. A large-scale multimodal chemical reaction dataset containing approximately 3.6 million reaction records is first constructed. Building on this dataset, a reaction-level graph neural network that incorporates three-dimensional structural information is developed. To address the problem of distributed data, a knowledge-driven privacy-preserving retrosynthesis framework is further designed to enable collaborative modeling without direct data sharing. In addition, an interpretable reasoning procedure for retrosynthesis is introduced through a collaborative mechanism that combines large and small models.Based on these components, a computational platform for synthesis and retrosynthesis tasks is implemented and tested on several representative reaction scenarios. Experimental results indicate that the proposed approach maintains relatively stable performance under different data scales. Nevertheless, further improvements are still required when dealing with highly complex reaction systems.