Some thoughts on the development of AI testing platform

Written by

Clara Bennett

Updated on:June-13th-2025

Last fiscal year, our team built an AI testing platform from 0 to 1.

In the field of API performance, we use AI to automatically generate stress testing use cases and implement the API stress testing workflow, including the automation of pre-resource preparation, stress testing execution and performance report output, and stress testing resource cleanup , which greatly improves the efficiency of API stress testing.

Platform Architecture

As shown in the figure, the platform includes four modules: stress testing case generation, stress testing script generation, stress testing script execution, and performance report generation.

1) Stress test case generation : Automatically construct prompts based on the stress test API and API documentation , then call the LLM reasoning service based on the prompts to generate the case context, and finally use the context to render the predefined API stress test case template to generate the API stress test case in YAML format. Manual confirmation and calibration of the case correctness can be performed.

2) Stress testing script generation : Convert the stress testing case in YAML format into an executable Jmeter stress testing script.

3) Stress testing script execution : Automatically create a Jmeter POD cluster based on K8S, distribute the stress testing script to the POD cluster, and initiate elastic distributed stress testing.

4) Performance report generation : Collect Jmeter stress test results and output performance reports including API throughput, latency, and error rate. Manual analysis of performance reports and stress test logs can be used to identify performance issues.

Use case generation

We abstracted and modeled the API stress testing workflow and defined a set of common API stress testing use case templates as follows:

The template consists of five blocks: global variables, stress model, pre-operation, stress testing steps, and post-operation.

For any API to be stress-tested, the information required to instantiate the template is the use case context, which is personalized information related to the API to be stress-tested and is inferred from the API definition document by the big model.

For example, to stress test an asynchronous creation API (CreateXXX), we need to poll the corresponding query API (ReadXXX), query the status of the created resources (XXX.status) , and calculate the time it takes for the status to reach the success state (Init->Processing->Success) as the actual performance of the asynchronous API.

The specific values of inference "status" and "success state" require the following prompts:

Three points of thought

1) AI makes the test platform simpler

People often complain that R&D does not do testing, but if there is a simple and easy-to-use testing tool, who would not want to do testing?

Previous API stress testing platforms had two major problems: 1) Stress testing scripts needed to be written manually, which had a high entry barrier; 2) Stress testing tools were only responsible for the automated execution of stress testing steps, and the preparation and release of stress testing resources still required manual work.

After our AI stress testing platform was launched, our R&D colleagues completed performance stress testing of hundreds of APIs in a short period of time and discovered dozens of performance risk issues, including serious concurrency bugs that had been hidden for many years (which would be a disaster once triggered online).

A simple and easy-to-use testing platform helps and promotes R&D to do more testing work and improve product quality. The emergence of AI undoubtedly gives us a huge opportunity to simplify the testing platform.

2) Modeling makes the test platform simpler

The original intention of the testing platform is to make testing simpler, but in recent years we have seen too many things that make the testing platform more and more complicated, so much so that some people are calling for de platforming .

When designing a simple testing platform, AI is only an auxiliary. Fundamentally, abstract capabilities are required, which is based on insights into the essence of specific testing business .

For example, in the AI stress testing platform, we modeled the API stress testing workflow and abstracted a general API stress testing use case template. Even without AI assistance, R&D can significantly improve efficiency by writing stress testing use cases based on this template.

3) Make good use of AI: think big, start small

AI is powerful, but we do not attempt to use AI to solve all problems in the field of API stress testing, nor do we attempt to use AI to generate all the content of API stress testing use cases.

On the contrary, we only use the powerful document understanding and reasoning capabilities of the large model to generate some key information needed to instantiate the stress testing case template.

Essentially, we are just using AI to solve small, specific, and highly deterministic problems one after another, and then integrating the results into a solution.

The direct benefit of this is to improve the accuracy of AI generation . Overall, the accuracy of the use cases we generated reached over 80%, which is much higher than the adoption rate of 20-30% of AI code generation (such as Github Copilot and Tongyi Lingma).

The indirect benefit is to build everyone's confidence in AI : when AI is implemented, we should first focus on making AI generate tangible value locally, and then expand the scope of AI application step by step.