Build a DeepSeek test case generation system knowledge base in 8 minutes

Quickly build the DeepSeek test case generation system to improve software testing expertise.
Core content:
1. The importance of knowledge base in AI test case generation
2. System architecture and key technical point analysis
3. Implementation details of knowledge base construction and enhanced search engine
1. Background and system positioning
Previously, I shared two 8-minute series of articles on DeepSeek empowering software testing, which attracted many like-minded students to discuss. Based on the previous articles , we have built basic test case generation capabilities. Today, I will mainly talk about the knowledge base.
Build a DeepSeek-powered test case tool in 8 minutes
Polaris School, WeChat Official Account: Polaris School Build a DeepSeek-powered test case tool in 8 minutes
Build a DeepSeek API intelligent testing engine in 8 minutes: before the coffee is cold, the test report is ready
Polaris School, WeChat Official Account: Polaris School Build a DeepSeek API Intelligent Testing Engine in 8 Minutes: Before the Coffee Gets Cold, the Test Report is Out
On this basis, this system introduces knowledge base enhanced generation (RAG) technology, which integrates domain documents and historical use case data to make the generated results more in line with business scenarios.
1.1 Why do we need a knowledge base?
Traditional AI generation solutions have two major pain points:
- Lack of domain knowledge
Large models cannot memorize enterprise private documents (such as requirement specifications and interface documents) - Waste of historical experience
Past test cases have not been effectively reused
This system is implemented through a lightweight RAG architecture (no vector database required):
Intelligent analysis of PDF documents ➡️ Build domain knowledge base Semantic retrieval of historical use cases ➡️ Forming an experience reuse mechanism Dynamically enhance the generation of prompt words ➡️ Improve the professionalism of use cases
Upload documents to the knowledge base The first time I generated the code, I did not choose to use the knowledge base enhancement -> The designed test cases have nothing to do with mobile phone number login The second time, choose to use knowledge base enhancement -> the designed test case knows how to register the system with a mobile phone number and knows more other details
2. Core Logic Analysis
2.1 System Architecture Overview
2.2 Description of key technical points
2.2.1 Knowledge base building module
Innovation :
Use unique paragraph ID Split into natural paragraphs, preserving contextual semantics Filter invalid short text (<20 characters)
2.2.2 Enhanced search engine
Design considerations :
Easier to implement than BM25 algorithm Computational efficiency: O(n) complexity, real-time response to thousands of data The results are highly interpretable and suitable for debugging
2.2.3 Dynamic Prompt Word Project
Enhancement strategy :
Knowledge fragment truncation processing (single segment ≤ 512 characters) Prioritization: Domain Knowledge > Historical Use Cases Strong format constraints (JSON Schema injection)
3. Analysis of Key Technology Selection
3.1 What is RAG?
Retrieval-Augmented Generation improves generation quality through the following processes:
User question → Knowledge retrieval → Prompt word enhancement → Large model generation → Result output
Differences from traditional generation:
- Real-time knowledge
No need to retrain the model - Data security
Sensitive information does not leave the domain - Controllability of results
Guide the generation direction through search results
3.2 Why not use a vector database?
Although vector databases (such as ChromaDB) are widely used in RAG, this system chooses the TF-IDF+CSV file storage solution for the following reasons:
Suitable for :
Small and medium-sized teams quickly verify the value of RAG The frequency of updating domain documents is low (weekly) Test data size < 100,000
4. Quick Deployment Guide
4.1 Environmental Preparation
4.1.1 Installing Python packages
4.1.2 Obtaining an API key
Visit any large model provider to register an account. This article uses Tencent Cloud. Create an application → Get sk-xxxx
Format KeyReplace in your code: headers = {"Authorization": "Bearer sk-xxxx"}
4.2 System Startup
4.3 Functional Verification Process
Upload domain documents :
Go to the "Knowledge Base Management" page Upload PDF format requirement document/interface document View the processed knowledge paragraph Generate enhanced use cases :
Check "Use knowledge base enhancement" View the generated boundary value test cases
Result export :
pythonpd.DataFrame(new_cases).to_excel("output.xlsx")
Directly copy the JSON example Export to Excel via Pandas:
5. Performance optimization suggestions (enthusiasts with energy and ability can continue to optimize)
5.1 Hierarchical storage of knowledge base
Prioritize high-level knowledge fragments when searching
5.2 Cache Mechanism
5.3 Asynchronous Processing
6. Expansion Direction
- Multimodal support: parsing requirements documents in images (OCR technology)
- Automated review: Add a use case quality scoring model
- CI/CD integration: automatic triggering with Jenkins/GitLab