[Cao Lilong, Deputy General Manager of the Financial Technology Department of Bank of Sichuan]Using the cloud as the basis to create a new core and empowering new finance with quality first – Sichuan Bank’s new generation distributed core system testing practice

[Cao Lilong, Deputy General Manager of the Financial Technology Department of Bank of Sichuan]Using the cloud as the basis to create a new core and empowering new finance with quality first – Sichuan Bank’s new generation distributed core system testing practice

(Source: China Financial Computer)

  author

Cao Lilong, deputy general manager of the Financial Technology Department of Bank of Sichuan

Wu Feng and Zhang Hu, Financial Technology Department, Bank of Sichuan

Under the guidance of Digital China and high-quality development strategies, promoting the digital transformation of core systems has become the core engine and key path for banks to achieve their strategic goals of digital transformation. As a typical representative of mergers and reorganizations among small and medium-sized banks, Bank of Sichuan was faced with the severe challenge brought about by the historic task of integrating two heterogeneous core systems when it was first established by the merger of the original Panzhihua City Commercial Bank and Liangshan Prefecture Commercial Bank. In order to achieve the goal of high-quality development of “Digital Bank of Sichuan”, Bank of Sichuan will launch a new generation of information technology project in 2022. Among them, the construction of a new generation of full-stack information innovation core system is the top priority of the entire project. In the process of building a new generation of full-stack information innovation core system, Bank of Sichuan has created a new cloud platform distributed microservice architecture, which has full-stack information creation, elastic scalability and high concurrency processing capabilities. In order to ensure the quality of the system under the new architecture, Sichuan Bank innovatively built a highly adapted cloud-native testing system to ensure the successful and smooth launch of the system.

  1. Project background and test objectives

  1. Project background: Quality challenges faced by introducing a new architecture

Bank of Sichuan has been unswervingly promoting its digital transformation strategy since its establishment. As the transformation enters the deep-water area and business volume and customer scale grow rapidly, the limitations of the original core system in functionality, scalability, and flexibility have become increasingly prominent, and it is no longer able to meet new development needs. Guided by the bank's strategy, Sichuan Bank's new generation full-stack information innovation core system has been completely reconstructed based on the “cloud platform + distributed” architecture. The new system aims to build a core platform that can support tens of millions of customers and hundreds of millions of daily transaction volumes, and create elastic scalability and enterprise-level flexible reuse capabilities. While the new architecture brings functional and technical advantages, it also comes with unprecedented quality risks and challenges: data inconsistency and lack of idempotence in distributed scenarios may directly lead to accounting errors and financial losses; the complexity of microservice dependencies makes problem points in the fault chain hidden and difficult to find; data consistency faces challenges in the separation mode of transactions and accounting. Therefore, building a testing system that can cope with the above risks and challenges has become the key to ensuring the high-quality production of new generation information technology projects.

  2. Testing goals: Build a solid foundation for comprehensive quality with a high-standard testing system

The core system test aims to comprehensively verify the completeness of the system in terms of functions, performance, security and resilience under the distributed microservice architecture of the cloud platform. The test needs to ensure that under the pressure of hundreds of millions of transactions, the business functions are complete and the accounting data is accurate; it also verifies that the system has strong fault tolerance and self-healing capabilities in various abnormal failure scenarios. Therefore, core system testing needs to ensure that the system meets the production requirements of “high availability, high stability, and strong data consistency” to ensure that the core system is put into production smoothly as scheduled.

  2. System construction:

  Test planning for adaptation engineering projects

  1. Test organizational structure planning: centralized management and division of labor and collaboration

The test organization adopts a matrix structure (as shown in Figure 1), and achieves efficient collaboration between horizontal management and control and vertical execution through unified scheduling at the management and control layer and professional division of labor at the execution layer.

  2. Test environment classification planning: professional division and unified scheduling

Based on the needs of the testing scope, Sichuan Bank made an overall plan for the testing environment and divided it into functional testing environment, non-functional testing environment and data testing environment. Among them, the functional testing environment covers business logic verification, interface contract testing, and compatibility coverage testing; the non-functional testing environment covers performance stress testing, chaos engineering, and disaster recovery drills; the data testing environment covers data desensitization, scenario data construction, and sensitive information protection.

  3. Test tool application planning: unified management and precise empowerment

Bank of Sichuan has achieved unified management and precise empowerment of testing tools by building three key platforms, including testing tool integration, chaos testing mocks, and environmental operation and maintenance monitoring.

(1) Integrated platform for testing tools

The integrated testing tool platform manages all test framework cases in a unified manner, supports the intelligent application and execution of coupling layering, version iteration, and conditional classification execution strategies in different testing stages. It uses tools to realize automatic design of test plans, automatic distribution of test tasks, and automatic statistics of result data.

(2) Chaos testing Mock platform

The chaos testing mock platform can dynamically inject faults (such as response timeout of more than 5000 milliseconds, error code injection, message tampering, etc.) into cloud platforms, middleware, data flows, third-party services, etc., and use tools to achieve abnormal level mock verification in the end-to-end business of the bank's core system (such as UnionPay payment, etc.).

(3) Environmental operation and maintenance monitoring platform

The environmental operation and maintenance monitoring platform conducts real-time monitoring and alarm verification of the test environment by accessing the observable platform, and uses tools to achieve full-link data flow indicator collection and precise location of defects. Taking intelligent baseline alarm as an example, it can automatically learn traffic patterns and accurately identify sudden increases/slumps (false alarm rate is less than 0.1%). When the chaos experiment is linked to fault injection, it can automatically monitor whether the service performance meets the service level agreement (SLA) requirements, and promptly detect the trend of service level decline.

  4. Test strategy design: full-cycle hierarchical classification and accurate testing

(1) Coupled layered testing strategy

Based on the coupled layered testing strategy, Sichuan Bank built a four-level progressive verification framework of “unit testing – integration testing – cluster testing – chaos testing” to achieve full-stack quality protection from code development to production environment.

Among them, unit testing focuses on code robustness, with core logic branch coverage exceeding 85%; integration testing aims to verify microservice interface contracts and fault isolation capabilities; cluster testing covers more than 200 end-to-end business scenarios such as deposits and loans; chaos testing verifies system high availability goals by injecting faults such as infrastructure downtime, network isolation, and data chaos, ensuring that fault recovery time is less than 30 seconds and zero data loss is guaranteed.

(2) Version accurate testing strategy

Bank of Sichuan has deeply integrated the DevOps platform to form a closed loop of “development-testing-release-monitoring”, increasing release efficiency by 50%. The platform's precise testing strategy covers the following key links: the automated triggering mechanism can automatically run P0-level use case sets (including unit and interface tests) when code is submitted or merged; quality gates are set during the construction process to force the passage of code specifications and quality test analysis (pass rate 10 0% access) to intercept defect leakage; realize environment collaboration by docking with K8s, which can dynamically generate test sandboxes, automatically deploy versions in multiple test environments, and perform classification system-level testing; finally, chaos experiments are automatically triggered after the grayscale release of the production environment to monitor fluctuations in business indicators.

(3) Conditional classification testing strategy

In terms of production-like configuration, chaos engineering experiments are carried out based on the resource configuration, deployment configuration, technical parameters, business parameters, etc. of the production environment. The test scope covers infrastructure, operating systems, databases, containers and middleware, network transmission, business application configuration and other technical aspects. Xinchuang equipment and environmental compatibility testing is also completed simultaneously. In terms of time consistent configuration, more than 200 business systems, data processing and data warehouse systems, and centralized batch processing systems associated with the core system maintain strict date consistency, and on this basis, technical standard consistency, batch running interest calculation, business flow and technical standard consistency tests are carried out. In terms of data capacity configuration, we conduct integrity, fault tolerance and consistency tests in billions of transaction scenarios for existing data and new data after migration to ensure the real-time separation and accuracy of transaction information and accounting amounts on the end-to-end link.

  3. Innovation Practice:

  Application of three-dimensional layered testing theory

In response to the complexity challenges of the new architecture, the Sichuan Bank testing team broke through the limitations of traditional functional verification and innovatively applied the three-dimensional testing theory covering the core system's “platform base, technical middleware, and business accounting”.

  1. Platform and base testing: consolidating the toughness of the cornerstone

(1) Scenario-based in-depth testing and verification of aPaaS transaction service platform

Build a full-link business test case library covering more than 200 core business scenarios such as deposits, loans, fund transactions and clearing, and journal entry. Through actual business initiator scenario analysis, business needs are transformed into accurate system information, and the entire process of information processing by the technical base under production conditions is simulated.

(2) Service platform and base toughness testing and verification

Integrate the chaos engineering platform and focus on verifying whether the core application system meets the design specifications in terms of high availability, self-healing capabilities, node failures, alarm capabilities, and resource elastic scalability. At the same time, by injecting disturbances such as interface instability or blockage, abnormal scenarios of the trading service platform are simulated to evaluate their impact on the core application system's businesses. In addition, network faults are injected into non-core dependent systems such as the registration center, configuration center, file transfer system, COS object storage, OB database, and key management system to verify the impact of these system abnormalities on the core business.

  2. Technology and middleware testing: Overcoming distributed core system verification problems

(1) Special test of distributed core mechanism

In the verification of the idempotent anti-duplication mechanism, a high-concurrency repeated request test script is designed, and tools are used to simulate repeated requests of more than 1000TPS, covering extreme scenarios such as short-term high frequency (such as repeated submission within 10 seconds) and concurrent initiation across more than 3 distributed nodes; faults such as master-slave switching back and forth in the Redis cluster and Token expiration time drift are injected to verify that global serial numbers (such as snowflakes, timestamps, and random value combination algorithms) are consistent with Redis The collaborative fault tolerance of the Token mechanism; for boundary conditions such as database unique index conflicts and idempotent component exceptions, check the consistency of interception logs and business data; achieve 100% interception of high-concurrency repeated requests, ensuring that zero business capital loss, no missed judgments in interception logs, and system throughput fluctuations of less than 5% meet the requirements.

In the distributed transaction robustness test, failures at each stage of the transaction's “Try-Confirm-Cancel” (TCC) were simulated, and abnormal scenarios such as 30% confirmation/cancel operation failure, coordinator node downtime, and network partitions were artificially injected to verify the effectiveness of the transaction suspension detector (based on the transaction status table and timing compensation) and the empty rollback interception strategy (relying on pre-service checks); and create resource lock collisions. In case of a conflict, concurrent transactions are triggered during the global lock validity period to verify the lock competition processing mechanism and transaction rollback efficiency; in the stress test scenario, the service level objective (SLO) of the statistical transaction recovery timeliness should not be greater than 3 seconds, and check the exception alarm coverage of the transaction monitoring disk; verify the final consistency of the transaction and exception protection capabilities to ensure that there is no dirty data, transaction recovery complies with SLA, and the exception capture rate reaches 100%.

In the unit-based capability verification, the ZoneID routing accuracy and fault isolation capabilities are verified by injecting custom routing strategies (such as forced cross-unit routing) (automatic flow cut-off when the unit is down); a 200% load is applied in the unit for closed-loop verification to detect the service call link closed-loop rate (required to be no less than 99.9%) and the data access localization ratio; at the same time, disaster recovery drills are carried out to simulate active-active data in the same city 50 millisecond network delay between centers to verify business tolerance (success rate of payment services >99.5%); cut off data synchronization channels between units to verify degradation strategies (such as local read priority) and data repair consistency; ultimately ensure unit closure and remote multi-active reliability, ensuring routing accuracy is 100%, unit fault automatic isolation time does not exceed 30 seconds, and data delay tolerance meets business thresholds.

(2) Special testing of the robustness of key components

In the special test of the robustness of key components, for the serial number generator, through 100,000-level TPS stress testing and node downtime fault injection, the ID uniqueness of the Snowflake algorithm in clock dialback and WorkerID conflict scenarios was verified, requiring zero repetition rate and performance fluctuations of no more than 5%; for distributed locks, scenarios such as lock holder node downtime and network partitions were simulated to test the lock automatic release mechanism and deadlock detection efficiency, requiring the lock failure protection rate to reach 100% and the fault recovery time limit to be controlled within 1 second. Within; for the message queue, by creating faults such as Broker downtime and network jitter, verify the interception capabilities such as secondary delivery of transaction messages, dead letter queue management and control, and consumer end power, ensuring zero message loss and a duplicate message interception rate of 100%; for For application monitoring, in extreme scenarios where resources are overloaded (CPU usage exceeds 90%) and network latency is higher than 500 milliseconds, the verification indicator collection delay does not exceed 3 seconds, and the accuracy of multi-dimensional alarms (such as a sudden increase in timeout rate) is not less than 99.9%.

  3. Business and accounting testing: ensuring the safety and accuracy of funds

In order to ensure the security of funds and the accuracy of accounting, business and accounting testing carries out systematic verification from the following four levels: first, build a quasi-real-time reconciliation engine, build a real-time comparison test framework of independent accounting flow and business flow levels, rely on global flow numbers to achieve accurate matching, quickly identify differences and track defects; secondly, for trillion-level “T+1” daily flow, carry out Spark-based offline verification task testing, support multi-dimensional difference analysis according to channels, products, accounts, etc.; First, the accounting rule engine was tested to verify the accuracy of complex interest calculations, fee amortization, and multi-dimensional account books (such as customer accounts, internal accounts, etc.), and logical verification of more than 2,000 accounting accounts was completed to ensure compliance with accounting standards. Finally, data consistency penetration verification was implemented. Through real-time data comparison tools across microservices and cross-databases, global serial numbers were used to associate upstream and downstream information, accurately check amount data, and ensure that the status of each link in the transaction link was strongly consistent with the accounting data.

  4. Practical results: For small and medium-sized banks

  System testing inspiration under complex architecture

The Bank of Sichuan's new generation information engineering core system testing work was successfully completed, which not only ensured the system's high quality and on-time production, but also explored a set of quality assurance methodology suitable for the complex architecture of small and medium-sized banks. The core experience can be summarized into the following four points:

  1. Adhere to strategic high-level promotion and create a new paradigm of resource coordination

The efficient advancement of testing work is fundamentally based on the firm implementation of the “Technology-based Banking Strategy” throughout the bank. The first is to strengthen organizational guarantees, integrate testing work into the overall management of the “control layer”, establish a test command center, and deploy an elite team of more than 1,000 people in business, technology, and testing across departments to ensure efficient decision-making and sufficient resources; the second is to ensure high-intensity investment, with testing investment accounting for the highest proportion of total technology investment in the industry, and building three types of professional testing environments: functional, non-functional, and data, laying a solid foundation for production-level simulation verification. This “strategic leadership + resource coordination” model is the fundamental prerequisite for overcoming testing problems.

  2. Innovate testing methodology to solve distributed architecture quality problems

Facing the new challenges of cloud native and distributed architecture, Sichuan Bank abandoned traditional testing ideas and innovatively constructed a three-dimensional hierarchical testing theory. The first is to reach down to the base of the platform and verify the resilience of basic components such as the aPaaS transaction service platform through full-link business scenario coverage and chaos engineering; the second is to focus on the core technology middleware and carry out special robustness tests for core mechanisms such as idempotent duplication prevention, distributed transactions, and unitized routing to avoid the risk of capital loss and data inconsistency from the source; the third is to ensure the security of business accounts upward and build quasi-real-time reconciliation and big data verification capabilities under the separation mode of transactions and accounting to ensure fund security and accounting accuracy. This methodology achieves full-stack quality coverage from underlying infrastructure to top-level business value.

  3. Create a high-efficiency tool chain to empower full-process quality control

Bank of Sichuan focuses on tooling, automation, and intelligence to significantly improve testing efficiency. The first is to build an integrated test management platform, unify the management of test assets and processes, and form an efficient DevOps closed loop of “development-test-release”, increasing the code delivery efficiency by 1.2 times. The second is to achieve accurate simulation of complex dependencies and abnormal scenarios by introducing key tools such as the chaos test Mock platform and full-link data comparison, and a total of more than 100 deep-level architectural defects have been discovered. The third is to connect to the production-level monitoring system during the testing phase to build observable operation and maintenance monitoring capabilities to achieve rapid location and demarcation of faults. The empowerment of the tool chain is the key to achieving the goals of “shortening the version iteration cycle by 30%” and “zero major defects” into production.

  4. Build a full-cycle closed-loop management and control system to maintain the bottom line of safety and quality

The quality of complex projects lies in process control. To this end, Sichuan Bank has built a three-in-one management and control system of “strategy-process-data”. The first is to implement a coupled layered testing strategy to form a progressive quality protection network from unit testing to chaos experiments; the second is to strengthen refined process control and ensure the orderly execution of a large number of test cases through strict quality access control, defect classification processing and 5 rounds of full-process production drills. The defect resolution rate reached 99.9%; the third is to strengthen data governance and standard implementation, promote 100% implementation of data standards during the test window period, complete the cleaning of tens of thousands of problematic data, ensure the accuracy and consistency of data from the source, and build the last line of defense for the stable operation of the system.

  

  Bank of Sichuan celebrates obtaining TMMi Level 4 certification

In July 2025, the Bank of Sichuan testing system passed major project inspections and continuous optimization, and successfully passed the Test Capability Maturity Model Integration (TMMi) Level 4 certification. Looking forward to the future, Bank of Sichuan will aim to obtain TMMi5 level certification and promote high-quality development of the testing system through AI empowerment.

  This article was published in “China Financial Computer” Issue 3, 2026

  

  Special reminder

  “China Financial Computer” magazineNo author publishing fees are chargedno fees will be charged in the name of the so-called “agent/editorial department of this journal” or “publication deposit” or “handling fee”. The payment account for magazine purchases is a public account, please do not transfer money to any personal account. In addition, this magazine does not charge any additional fees for issuing invoices. If there are any abnormalities, the author is requested to check and verify with the editorial department of this journal immediately to avoid causing risks or losses to himself.

  Contact information for the editorial office of this journal:

010-51915111-816

  Submission email:

[email protected]

  Market cooperation:

  010-51915111-813

010-51915111-812

010-51915111-826