Why Testing Intelligent Systems Should Start With "Use Case Continuation"

Recommendation
Start with the continuation of use cases, a new starting point for intelligent testing.
Core content:
1. The advantages and necessity of use case continuation in testing intelligence
2. Matching LLM capabilities to solve the pain points of test case writing
3. Reduce project risks and enhance the team's confidence in AI testing
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
Why start with the continuation of use cases?
1) Test analysis and design are the most demanding parts of the entire testing activity for business and technical understanding
When completing this type of work, testers not only need to refer to the requirements document/PRD corresponding to this business transformation, but also need to refer to various aspects of knowledge such as past similar requirements, system implementation, business background, etc. At present, even though LLM's own capabilities are already quite strong, in reality, the software engineering capabilities of most company teams are still at the workshop stage, and the R&D team's own digitalization level is very low. It is impossible to provide LLM with context other than PRD through RAG and other methods, which will seriously limit the effect of LLM in test design.
The scenario of use case continuation is relatively simple, mainly relying on the pattern and logic of existing test cases, and has low dependence on additional context. Even if the team's digitalization level is limited, LLM can learn and imitate based on the existing use case historical data, avoid the "information blind spot" caused by insufficient context, play a role within a controllable range, improve the efficiency of test case writing, and avoid the failure of AI applications due to the team's digitalization shortcomings.
2. Matching LLM's ability advantages to solve the pain points of test case writing
There is a lot of repetitive work in test case writing (as shown in the figure below, under the same test point, the test steps and data are highly similar), which not only consumes manual energy, but is also regarded as a typical problem of "low value and easy consumption" in testing work.
One of the core capabilities of LLM is to generate content by learning existing patterns. "Referring to previous use cases to continue writing new use cases" just fits this feature, which is consistent with the "code continuation" logic widely used by AI in software development. Using LLM for use case continuation can efficiently handle repetitive and mechanical writing tasks, freeing testers from simple labor and focusing on more valuable work such as complex scenario design and business risk analysis. This approach fully utilizes the capabilities of LLM, while directly solving the practical pain points of test case writing. It is an efficient entry point for combining AI technology with testing work.
3. Reduce project risks and enhance team confidence
The function of case continuation is relatively simple and clear, the technical complexity and uncertainty during implementation are low, and the risk of failure is controllable. Compared with directly trying complex AI applications such as test analysis and design, case continuation is easier to quickly achieve phased results (such as generating a certain number of test cases that meet the requirements), allowing the team to truly feel the improvement of AI on test efficiency, thereby enhancing confidence in further in-depth application of AI + testing, reducing the team's resistance to the introduction of new technologies, and laying a foundation for trust for subsequent more complex AI test applications (such as intelligent test analysis and automated scenario design).
1. Core objectives of the solution
The following content is generated by Doubao, and the consistency with the actual implementation plan is not guaranteed. The internal project has completed the continuation of test points-test cases-automated test cases POC##
The following is a complete solution design for LLM test case continuation, which realizes the automatic expansion from "test points" to "complete test cases (including steps + data)", combined with knowledge base retrieval, prompt engineering and verification mechanism to ensure generation quality and business adaptability:
Input: test points (such as "payment amount is consistent with order amount")
Output: structured test cases (preconditions, test steps, test data, expected results) Core capabilities: LLM is used to generate test steps and data that meet business rules based on historical use case patterns, supporting full coverage of normal/abnormal/boundary scenarios.
2. Solution architecture and technology stack
3. Core module design
1. Test point parsing module
·Keyword extraction:
Use NLP tools (such as spaCy) to parse test points and extract functional modules (such as "payment" and "order"), test types (such as "functional test" and "boundary conditions"), and core objects (such as "amount" and "payment method").
Example: Input "The payment amount is consistent with the order amount" → Functional module = payment, test type = functional test, core object = amount consistency.
· Scenario classification:
Map to the preset test scenario label (normal process / abnormal process / boundary condition / compliance) to guide the subsequent generation direction.
2. Historical use case knowledge base construction
· Data source:
Collect historical manually written high-quality test cases (including complete steps, data, expected results), and store them by functional module (payment, order, user center, etc.).
Annotate key features: such as "payment method = balance payment", "test type = abnormal scenario", "data type = boundary value".
· Retrieval strategy:
Based on RAG (retrieval enhanced generation): match the test key points keywords with the use case labels in the knowledge base, and return the top 3 similar use cases as the reference context for LLM generation.
Example: Enter "Password-free payment limit verification" → retrieve the historical use case "PAY-015 Password-free payment limit verification" and its step template.
3. LLM generation engine (core logic)
Prompt engineering design:
Basic prompt template (guide LLM to generate structured content):
[Task] According to the test points "{Test points}", expand the complete test case, including:
1. Preconditions (need to clarify user status and data preparation, such as "user balance 50 yuan, bind bank card")
2. Test steps (describe the operation in steps, each step does not exceed 20 words, such as "1. Select balance payment")
3. Test data (divided into normal/abnormal/boundary data, such as normal data = order amount 80 yuan, abnormal data = order amount 0 yuan)
4. Expected results (corresponding to the steps, clarify the status/data changes, such as "1. The system prompts 'insufficient balance'")
[Reference use case] {Similar historical use case content}
[Business rules] {Supplementary business rules, such as "The limit of password-free payment is 100 yuan, and secondary verification is required if it exceeds"}
Data generation guidance:
Require LLM to press 3 Class data generation:
·Normal data: legal input that complies with business rules (such as order amount = 99 yuan, ≤ password-free limit)
·Abnormal data: illegal input that violates rules (such as order amount = -50 yuan, negative amount)
·Boundary data: critical value that just triggers the rule (such as order amount = 100 yuan, equal to the password-free limit)
·Generation optimization:
Limit the output format to table or structured text to facilitate subsequent parsing and import into test management tools.
For complex scenarios (such as cross-system interaction), add interface call logic to the prompt (such as "order status update interface needs to be called after payment is successful").
4. Verification and optimization module
Automation verification rules:
Format verification: Check whether the 4 necessary fields (preconditions, steps, data, expected results) are included, and whether the number of steps is ≥3.
Logic verification:
Whether the test data covers 3 types of scenarios (normal/abnormal/boundary), and at least 1 item in each category.
Whether the expected result corresponds to the steps and data one by one (such as entering a negative amount, the expected result should include an "error prompt").
·Manual review interface:
For use cases that fail the automated verification, push them to the tester for manual correction (such as supplementing business details and adjusting the order of steps).
Establish a feedback loop: Feed the manually corrected use cases back to the knowledge base to optimize the LLM training data.
IV. Implementation step example
Input: Test point "The payment amount is consistent with the order amount"
1. Parsing and retrieval:
oKeywords: payment, amount, order, consistency → Match to the historical use case "PAY-016 The payment amount is consistent with the order amount (including freight/tax)".
oBusiness rule: payment amount = product price + freight + tax, discount can be deducted.
2. LLM generates output:
[Preconditions]
1. User logs in, the order includes a product price of 100 yuan, a shipping fee of 10 yuan, and a tax fee of 5 yuan, and a 50 yuan coupon can be used
2. The payment method can be selected as balance/bank card, and the balance is sufficient
[Test steps]
1. Submit the order and choose to use the coupon
2. Confirm the amount payable and select the payment method (such as balance payment)
3. Enter the payment password to complete the payment
[Test data]
| Data type | Order amount (yuan) | Discount amount (yuan) | Shipping fee (yuan) | Tax (yuan) |
|------------|----------------|----------------|------------|------------|
| Normal data | 100 | 50 | 10 | 5 |
| Abnormal data | -50 | 0 | 10 | 5 | (negative product price, illegal input)
| Boundary data | 0 | 0 | 10 | 5 | (product price 0 yuan, normal freight and tax)
[Expected results]
1. The amount payable is displayed as 100+10+5-50=65 yuan
2. After successful payment, the order status is updated to "paid"
3. The account balance is reduced by 65 yuan, and the payment voucher amount is consistent with the order
3. Verification and output:
Automatic verification: data covers 3 categories, steps correspond to expected results → passed.
Output: structured test cases (can be imported into Excel or TestRail).
V. Key technical points
1. Knowledge base cold start:
In the initial stage, 50-100 standard cases can be manually written as seed data to cover core scenarios and guide LLM to learn business models.
2. Dynamic prompt adjustment:
According to the quality of the generated results, adjust the prompt words in real time (such as adding "steps must include exception handling" and "data must be marked with boundary values" and other detailed requirements).
3. Integration with test management tools:
o Synchronize the generated use cases to Jira, TestLink and other tools through API, supporting batch import and status management.
VI. Effect evaluation indicators
Metric | Definition | Target Value |
---|---|---|
Generation Success Rate | Proportion of examples that are correctly formatted and logically correct | ≥90% |
Manual Adjustment Rate | Proportion of examples requiring manual adjustment | ≤20% |
Scenario Coverage | Coverage of generated examples across test types (normal/abnormal/ edge) | 100% |
Business Compliance Rate | Proportion of examples compliant with business rules | ≥95% |
VII. Risks and responses
· Deviation in understanding of business rules: LLM may misunderstand complex rules (such as calculation of installment handling fees), and it is necessary to clearly attach a link to the rule description document or an example in the prompt.
· Incomplete data generation: LLM is required to generate according to the "normal + abnormal + boundary" template, and the regeneration mechanism is triggered when it is missing.
· Poor quality of historical use cases: In the initial stage, all generated use cases are manually reviewed, low-quality historical data is gradually eliminated, and a high-quality knowledge base is built.
Through this solution, efficient expansion from "test points" to "complete use cases" can be achieved, greatly reducing the investment of testers in repetitive work, while ensuring the standardization and business adaptability of generated content, providing a reusable engineering path for the implementation of AI + testing.