面向合成与逆合成问题的计算智能方法研究

丁晓敏; 陈雷

doi:10.11991/cccf.202603004

面向合成与逆合成问题的计算智能方法研究

丁晓敏,
陈雷

Computational Intelligence Methods for Synthesis and Retrosynthesis Problems

摘要

摘要: 合成与逆合成问题是复杂科学计算中的典型代表，其计算难点主要体现在多模态信息耦合、高维组合搜索空间以及强领域知识约束等方面。尽管近年来人工智能方法在相关任务中取得了一定进展，但现有方法在反应过程建模、跨数据源协同计算以及结果可解释性方面仍存在不足，这在一定程度上限制了其在真实科研与工程场景中的应用。本文围绕合成与逆合成问题的计算建模展开研究，重点关注如何在计算层面刻画反应过程中的关键结构变化。从数据表示、模型设计与系统实现3个方面展开讨论，其中工作重点放在反应级建模方法及其可扩展性分析上。具体而言，构建了一个包含360万条反应记录的大规模多模态化学反应数据集，并在此基础上提出了引入三维结构信息的反应级图神经网络模型；同时，针对跨机构场景下的数据隐私问题，设计了知识驱动的隐私保护逆合成计算框架；进一步，通过大小模型协同机制实现了具有一定可解释性的逆合成推理过程。基于上述方法，开发了面向合成与逆合成任务的计算平台，并在多类典型反应场景下进行了验证。实验结果显示，该方法在不同规模数据条件下均表现出较为稳定的性能，但在复杂反应场景中的进一步优化仍有探索空间。

Abstract: Synthesis and retrosynthesis represent typical problems in complex scientific computing. They involve tightly coupled multimodal information, high-dimensional combinatorial search spaces, and strong domain-specific constraints. In recent years, artificial intelligence techniques have made notable progress in related tasks. However, many existing approaches still face limitations in modeling reaction processes, supporting collaborative computation across distributed data sources, and providing interpretable results. These issues restrict their applicability in practical scientific and engineering settings.This work studies the computational modeling of synthesis and retrosynthesis, with particular attention to how key structural transformations in reaction processes can be represented and analyzed. The study is organized around three aspects: data representation, model design, and system implementation, with a focus on reaction-level modeling and its scalability. A large-scale multimodal chemical reaction dataset containing approximately 3.6 million reaction records is first constructed. Building on this dataset, a reaction-level graph neural network that incorporates three-dimensional structural information is developed. To address the problem of distributed data, a knowledge-driven privacy-preserving retrosynthesis framework is further designed to enable collaborative modeling without direct data sharing. In addition, an interpretable reasoning procedure for retrosynthesis is introduced through a collaborative mechanism that combines large and small models.Based on these components, a computational platform for synthesis and retrosynthesis tasks is implemented and tested on several representative reaction scenarios. Experimental results indicate that the proposed approach maintains relatively stable performance under different data scales. Nevertheless, further improvements are still required when dealing with highly complex reaction systems.

参考文献(20)

施引文献

资源附件(0)