Practical cases of domestic enterprises using AI big models to empower software testing

How can domestic enterprises use AI big models to improve software testing efficiency and quality? This report deeply analyzes the current status and practical cases of domestic enterprises using AI big models to empower software testing.
Core content:
1. The main application directions of AI big models in the field of software testing
2. Domestic enterprise practice cases: application practices of CITIC Bank and Postal Savings Bank
3. Challenges faced by using AI big models to empower software testing
1. Overview
In recent years, with the rapid development of large language model (LLM) technology, domestic enterprises have begun to apply it to the field of software testing to improve testing efficiency and quality. This report will comprehensively analyze the current status of domestic enterprises' application of large models to empower software testing from the aspects of application direction, practical cases, implementation methods, effect data and challenges faced.
At present, the application of big models in the field of software testing is mainly concentrated in text generation scenarios, such as test case generation, test analysis, automated test script generation, etc. However, the application in behavior generation scenarios (such as automatic execution, result analysis, automatic program repair, etc.) is still in the exploratory stage.
2. Main Application Directions
According to the survey, domestic enterprises use big models to empower software testing in the following directions:
1. Automatic generation of test cases
Use big models to understand and analyze requirement documents, design documents, codes, etc., and automatically generate test cases that cover functional highlights, operation steps, and expected results, reducing the workload of testers in writing cases and improving test coverage.
2. Test analysis and mind map generation
Based on the text comprehension ability of the big model, business requirements are parsed, and test points, test flow charts and mind maps are automatically generated to assist testers in understanding complex requirements and conducting systematic test design.
3. Automated test script generation
Combined with the code generation capabilities of large models, test automation scripts are automatically generated according to interface definitions, page elements and test case descriptions, reducing the cost of writing automated test scripts.
4. Intelligent test questions and answers
Build an intelligent question-answering system based on a large model to provide instant answers to the test knowledge base, test specifications, and questions encountered during the test process, reducing document query time.
5. Test Data Generation
According to business rules and interface requirements, test data that conforms to business logic is automatically generated to improve the accuracy and comprehensiveness of test data.
6. Defect analysis and prediction
Use big models to analyze historical defect data and code features, predict potential defect risk points, and take precautions in advance.
3. Domestic Enterprise Practice Cases
1. China CITIC Bank's "Second Brain" test model
Implementation Background
Faced with the complexity of banking business processes and the limited personal capabilities of testers, CITIC Bank has created a "second brain" for testing big models based on big model technology and high-quality data assets.
Specific methods
CITIC Bank uses advanced big model technology and relies on high-quality and professional data assets to train models, integrating big models into all aspects of the software testing life cycle to play an auxiliary role, thereby improving test coverage, quality and efficiency.
Application Effect
The practice of CITIC Bank shows that large models can assist testing in the short term, serve as the main tester in the medium term, and in the long term, are expected to become the leading tester, significantly improving the work efficiency and test quality of testers.
Challenges
Data quality issues: Large models require a large amount of high-quality and professional training data, while software testing-related data is complex and specialized, making data collection and processing difficult.
Insufficient interpretability: In test applications, the credibility of the output results must be ensured, so the interpretability of large models needs to be further improved.
Robustness and stability: Large models need to be more robust when facing various abnormal situations and complex test scenarios.
2. Postal Savings Bank of China develops and tests a large model
Implementation Background
As big model technology becomes increasingly mature, Postal Savings Bank of China has actively explored intelligent testing, built a "R&D testing big model", and created an end-to-end intelligent R&D solution "Intelligent R&D Testing Assistant".
Specific methods
Postal Savings Bank of China has adopted the following specific methods:
Build and deploy private large language test models, train them with a large amount of test data and standard specification documents, and optimize the models using prompt engineering and fine-tuning.
Realize localized model deployment and use the LangChain framework to combine the large language model with the bank’s own test knowledge base and computing logic.
Conduct secondary development on the Web side, optimize user interaction through customized Prompt templates, and improve model output quality and answer accuracy.
In actual testing scenarios, the model realizes the intelligent generation and optimization of test cases, mind maps, automated test scripts, and unit test codes.
Application scenarios and effects
Test case generation: The large model can complete thousands of words of requirements analysis in more than 10 seconds and generate detailed test cases covering key functional points, operation steps and expected results, greatly shortening the test case writing time.
Requirements mind map and flow chart: effectively help testers interpret requirements and convert them into intuitive flow charts or mind maps, improving the efficiency of requirements analysis.
Automated test script generation: Based on interfaces and business scenarios, test scripts are automatically generated through large models, which improves the efficiency and coverage of interface testing and integration testing.
Intelligent support for unit testing: Directly analyze the source code to expand the scope and coverage of unit test analysis.
Future development plans
Deepen the research on model interpretability, focus on the reasoning process and generated results of large models, and improve credibility.
Continue to optimize model training and use high-quality and more diverse test data to continuously improve model performance.
Expand applicable scenarios and build large-scale and diverse test data sets to better meet specific testing needs.
Improve the testing solution and build a more complete and continuous testing support system by connecting with the software testing platform.
Exploring technology integration, combining interactive technologies such as augmented reality and virtual reality, to provide a more intuitive test and analysis environment.
3. Baidu Intelligent Testing Assistant TestMate
Implementation Background
Baidu has developed the intelligent testing assistant TestMate to serve Baidu's internal intelligent testing work, aiming to improve testing efficiency and quality.
Specific methods
The design and implementation of TestMate include the following key points:
Basic capabilities based on large models, such as intent recognition, memory management, and multi-round interactions.
Built-in general test domain knowledge and atomic testing capabilities ensure that general requirements of the testing phase are covered.
It provides a capability center that allows users to customize prompts, automated use case templates, and private business knowledge, and can customize upstream and downstream capability combinations to connect different functions in series, thereby building an exclusive intelligent testing assistant for the enterprise that meets its own business needs.
Application Effect
TestMate provides highly intelligent services for Baidu's internal testing work, helping to improve internal testing efficiency and accuracy. Through rich visual interactive components, it achieves effective linkage with the requirements management platform, use case management platform, and interface management platform.
4. Huawei's test automation code generation based on LLM
Implementation Background
Huawei has chosen big model-assisted test automation code generation as a breakthrough point for the application of big models in the field of intelligent testing, aiming to improve the level of test automation.
Specific methods
Huawei's LLM-assisted test code generation method mainly consists of the following:
Building a unified system test code generation solution begins with analyzing the enterprise-level, domain-specific business context.
Establish a hierarchical large model training system, including: a. L1 stage: use massive test raw code for incremental pre-training. b. L2 stage: use supervised tuning (SFT), combined with standardized prompt engineering and data cleaning (according to 56 data quality inspection specifications) to improve generation accuracy.
The Retrieval Augmented Generation (RAG) strategy is introduced to ensure the consistency of generated content with existing test knowledge by constructing prompts containing global information, use case level information, and step information, and retrieving historical AW code and text script pairs.
Integrate the TestMate IDE plug-in to achieve online reasoning and streaming code generation, and support post-processing (such as indentation repair, deduplication, and abnormal content repair).
Application Effect
Deployment and practice results show that Huawei's LLM-assisted test code generation solution has achieved good application results in multiple product lines:
In the pilot of the wireless product line production environment, the subjective acceptance of the generated test code reached about 70%, of which about 31% of the code could be used directly, 41% required minor modifications, and 28% was judged to be unusable.
The data storage product line shows that the adoption rate of the newly generated approximately 200 use cases and more than 1,800 test steps is over 60%, and the overall script writing efficiency has been significantly improved (nearly 1 times the efficiency improvement).
The application results of the Cloud Core product line show that the adoption rate is around 65%.
By the end of June 2024, the number of people using large-scale model-assisted testing automated code generation will reach about 3,000, covering more than 60 products, and the amount of generated code will exceed 400,000 lines.
Challenges
The main challenges Huawei faces in the LLM-assisted test code generation process include:
Data quality and corpus issues: The training corpus is incomplete and lacks diversity, especially the incomplete TC information of test cases in the prompt project and the small number of AW code samples, which limits the generation effect.
Prompt design and context completeness: Generating high-quality code depends on the design of prompts. Currently, there is a need to further improve the prompt extraction of global information, use case level, and step level.
Model foundation and training strategy: There are differences in the effects between different base models (such as self-developed Pangu, Starcoder, GLM2, etc.), and the unified training and fine-tuning strategy (combination of SFT and RAG) still needs to be optimized.
Product line differences and data mixed training risks: There are differences in test codes and method libraries between different product lines. Mixing them for training may interfere with each other and affect the consistency of model generation.
5. ByteDance's full-link intelligent testing system
Implementation Background
ByteDance flexibly uses software engineering analysis methods combined with big models to provide full-link efficiency improvement capabilities for QA scenarios and builds a full-link intelligent testing system.
Specific methods
ByteDance's full-link intelligent testing system construction includes the following important steps:
Combining software engineering analysis methods with large model technology, it provides full-link efficiency improvement capabilities for QA scenarios.
Build multiple atomic capability modules, including intelligent unit testing, intelligent grading, test case generation and quality assessment.
Provide service functions such as visual configuration and measurement dashboards, thus forming a full-link intelligent R&D quality assurance system.
Intelligent unit test generation implementation method
ByteDance's intelligent unit test generation is mainly achieved by combining the large model (LLM) with engineered deep program analysis. The specific process includes:
Data collection: Use real business traffic collection and interface automation to obtain input and output parameters and dependency data during actual runtime.
Data distillation: The collected data is disassembled, abstracted, and desensitized to ensure data authenticity and provide high-quality corpus for model generation.
Model generation: Using customized prompt words and thought chain solutions, large models are combined with early engineering data analysis to generate unit tests.
Automatic correction: For syntax errors and assertion failures encountered during the generation process, syntax and runtime correction techniques are used to ensure that the generated test cases are compilable and runnable.
Integration process: Seamlessly integrate the generated test cases with the existing test framework to improve overall test coverage and R&D efficiency.
Application Effect
Test coverage has been significantly improved: warehouse-level coverage has reached an average of 40%, and some warehouses have been increased to 60% after retraining.
The quality of unit testing has been improved: the assertion pass rate of a single method has reached 83.09%, and the compilation and running success rate of the overall test cases has been improved.
Generation volume results: 2,627 test cases have been generated, and the business team has recognized the authenticity and effectiveness of the test data.
Optimization of R&D efficiency and iteration cycle: Automatically generating unit tests reduces the labor cost of writing tests, allowing developers to focus more on business logic and shorten the time to go online and iterate.
6. Exploration of test cases generated by Dewu based on AIGC
Implementation background and methods
Dewu has built a complete test case generation tool by leveraging artificial intelligence, big models and retrieval-augmented generation (RAG) technology. The system is divided into three main stages: user input, test point analysis and integration, and use case generation.
Specific process
User input stage: select the requirement module, copy the function points in the requirement PRD and input them into the system.
Test point analysis and integration phase: AI automatically extracts test points, allowing users to complete the test point list through commands or manual adjustments.
Use case generation phase: The system automatically generates corresponding test cases, supports adjustment of generation results, and finally synchronizes them to the use case management platform.
Supports multiple rounds of dialogue and preprocessing: Improves generation results through point-by-point input, expert experience intervention, demand preprocessing, etc.
Application Effect
Dewu used large models to generate test cases and achieved significant positive results:
Significantly improved efficiency: Generating test cases based on AI can save an average of about 40% of the time required to write test cases.
High generation accuracy: In simple demand scenarios, the adoption rate of test cases generated with AI assistance reaches over 90%.
After integrating RAG technology, the generated test points are more comprehensive and the use case adoption rate is significantly improved (for example, one requirement increased from 72% to 85%, and another requirement increased from 52% to 94%).
Overall, the generated test cases cover the core functional points and main scenarios.
7. Other corporate practices
Other companies such as Tencent (Case Copilot), Youku (test case generation), iFlytek (AiTest), etc. are also actively exploring the application of big models in software testing in their respective fields and have achieved varying degrees of success.
IV. Implementation Effect and Data Analysis
1. Improved testing efficiency
According to the collected data, the application of big models in software testing can significantly improve testing efficiency:
Dewu Company Report: Using large models to generate test cases can save an average of about 40% of the time required to write test cases. Huawei data shows that the efficiency of writing test automation scripts has nearly doubled. Postal Savings Bank case: The large model can complete thousands of words of requirements analysis and test case generation in more than 10 seconds, greatly shortening the writing time.
2. Build quality and adoption rate
The quality and adoption rate of test cases generated by large models are at a high level:
Dewu case: In simple demand scenarios, the test case adoption rate reached over 90%. Huawei case: The subjective acceptance rate of the test codes generated by the wireless product line is about 70%, of which 31% can be used directly and 41% requires minor modifications. Huawei's data storage product line and cloud core product line: adoption rates are around 60%+ and 65% respectively. ByteDance case: The assertion pass rate of unit test methods reached 83.09%.
3. Improved coverage
ByteDance: The average warehouse-level code coverage rate reached 40%, and some warehouses were increased to 60% after optimization. Dewu and Postal Savings Bank: The generated test cases can cover core functional points and major scenarios.
4. Application Scale
Huawei's statistics as of the end of June 2024: Nearly 3,000 people are using large-scale model-assisted test automation code generation, covering more than 60 products, and the amount of code generated by test automation is more than 400,000 lines. ByteDance: 2,627 test cases have been generated and approved by the business team.
V. Implementation Challenges and Solutions
1. Data quality challenges
Challenge : Obtaining high-quality and professional training data is one of the main difficulties in applying large models in the field of software testing.
Solution :
Huawei has introduced 56 data quality inspection specifications and the automatic cleaning tool Gaia. Postal Savings Bank of China has used the test data and knowledge accumulated over the years to train a private large language testing model at the L2 level. ByteDance ensures the authenticity and validity of training data by collecting real business traffic and using data distillation technology.
2. Model Reasoning and Generation Challenges
Challenges : The generation time is too long, the accuracy of the generated results is not high, and there are many repeated use cases.
Solution :
Dewu chooses more efficient models (such as GPT-4o-mini) to reduce the generation time. Huawei and Postal Savings Bank of China improve generation quality through prompt engineering and model tuning. Many companies use RAG technology combined with professional knowledge base to improve generation accuracy and coverage.
3. Challenges of adaptability in complex scenarios
Challenges : Lack of adaptability to complex needs and scenarios, low adoption rate and coverage.
Solution :
Huawei adopts a phased approach: first SFT tuning to deal with old features, and then using RAG to accelerate the generation of new features. Postal Savings Bank of China builds larger and more diverse test data sets for specific scenarios. Dewu enhances its ability to handle complex scenarios through continuous batch dialogue and expert experience input.
4. Integration and implementation challenges
Challenge : Integration of large models with existing testing processes, tools, and platforms.
Solution :
Baidu TestMate provides a wealth of visual interactive components, which can be embedded in the Web UI to achieve linkage with existing platforms. Huawei integrates the TestMate IDE plug-in to achieve online reasoning and streaming code generation. Postal Savings Bank of China integrates intelligent R&D and testing capabilities into the DevOps platform and testing platform to achieve seamless integration.
6. Future Development Trends
1. Technological evolution trend
The application of big models in behavior generation scenarios (such as use case execution, result analysis, automatic program repair, etc.) will continue to deepen. The introduction of multimodal large models will make testing smarter and able to handle multiple input forms including images, audio, etc. Customized model training will become more popular, and companies will perform domain tuning and vertical scenario optimization based on open source large models.
2. Application scenario expansion
Automatic defect repair: The entire process from defect detection to automatic code repair is intelligent. Full-scenario testing: covers the entire process from requirements analysis to acceptance testing. Intelligent test operation and maintenance: Use big models to manage the test environment, govern test data, etc.
3. Organizational change trends
Transformation of testing roles: Testers shift from writing tests to designing test strategies and analyzing results. Reconstruction of test system: Build new test platform and test process based on big model. Human-machine collaboration mode: People and large models work together to give full play to their respective advantages.
VII. Conclusion and Recommendations
1. Conclusion
The application of big models in the field of software testing has achieved significant results, mainly in terms of improved test efficiency, improved test quality, and enhanced test coverage. However, current applications are still mainly in text generation scenarios, and there are challenges such as data quality, model reasoning, and adaptability to complex scenarios.
2. Recommendations for Enterprises
Develop a phased application strategy : start with relatively mature scenarios such as test case generation, and gradually expand to more complex scenarios.
Focus on data accumulation and governance : Build a high-quality test database to provide a foundation for large-scale model applications.
Adopt a hybrid technology route : combine RAG, prompt engineering, model fine-tuning and other technologies to improve generation quality.
Emphasis on integration with existing systems : Ensure seamless integration of large models with existing testing platforms and tools.
Focus on improving the capabilities of testers : train testers to master large-scale model application skills and adapt to role changes.
Continuous evaluation and optimization : Establish an effective evaluation system, quantify the application effect of large models, and continuously optimize.