Abstract:
With the widespread application of artificial intelligence in core financial services, risks arising from the vulnerability and opacity of intelligent models have become increasingly prominent, making effective risk detection a shared concern in both academia and industry. However, as the task types, model forms, and business workflows of financial domain-specific models continue to grow more complex, risk detection faces significant challenges in terms of task diversity and object complexity. Meanwhile, constrained by requirements for data security, business stability, and regulatory auditing, model risk detection in financial scenarios is also challenged by environmental sensitivity, placing higher demands on the stability, reproducibility, and traceability of assessment results. In particular, existing methods often assume white-box access and rely mainly on one-off, pointwise evaluations, making them difficult to adapt to financial domain-specific models characterized by system-level decision functions jointly determined by models, data, rules, and processes. They also struggle to deliver repeatable engineering capabilities under low-intrusion and sustainable operating conditions. To address these challenges, this article proposes a risk assessment technology and system for financial domain-specific models. The proposed solution takes the system-level decision function as the unified evaluation object, establishes a dual-engine mechanism consisting of a black-box risk assessment engine and a scenario generation engine, and organizes the assessment process through a four-layer architecture comprising the access layer, orchestration layer, method layer, and evidence layer. The black-box risk assessment engine uses proxy and shadow mechanisms to conduct multi-dimensional risk assessment, including privacy, accountability, and robustness, under interface-observable conditions. The scenario generation engine provides controllable inputs for risk triggering, retesting comparison, and system-level localization through synthetic data, stress scenarios, and transfer mechanisms enhanced by financial-theory constraints. In addition, independent evaluation resource pools and task scheduling are introduced to decouple computational resources, while process logging and evidence binding support result review and audit traceability. Based on practices from a National Key R&D Program project, this article further presents three case studies: training-data leakage risk detection for sequential recommendation models, accountability risk detection and enhancement for financial recommendation, and robustness risk detection for option implied volatility surfaces. The results verify the feasibility of the proposed solution under black-box, low-intrusion, and resource-constrained conditions. This study provides useful insights for building sustainable, repeatable, auditable, and standardized software systems for domain-specific model risk detection in finance and other business domains, and can also be regarded as a meaningful attempt toward constructing symbol-neural integrated intelligent software systems.