How to extend the prompting word through RAG?

Written by

Jasper Cole

Updated on:June-13th-2025

In this era where AI is deeply involved in software development, as a test engineer, if you are still at the stage of "how to write automated test scripts", you are a bit behind. Today, we have entered a new era of "how to work with big models".

Today, I want to take you to learn more about a technical term you may have heard of but are not familiar with - RAG (Retrieval-Augmented Generation) , and an important concept that you will not be able to avoid in the future: context window . Don't worry, these two concepts are not complicated, let's start with a small story in life.

A story of search and answer

Let’s say you are testing a travel reservation system and you say to the model: “I want to test the risk associated with removing the reservation feature.”

If you ask this question directly to a large language model, it might give you some general suggestions based on the general knowledge it has seen during training, such as "consider permission verification", "check if there are any unpaid orders", etc.

But you know very well that this system has its own business logic, detailed user stories, functional descriptions, and even specific exception processes. These document models don’t know, they are just “blind guessing”.

At this time, RAG comes on the scene.

What is RAG and how can it help you ask smarter questions?

The core idea of the RAG system is actually very straightforward: check the information first, then answer the questions .

Its workflow is roughly as follows:

Receive your questions : For example, "I want to test the risk points associated with deleting a reservation."
Go to the document library to retrieve relevant content : such as user stories, requirement documents, interface descriptions, etc. in the project.
Select the most relevant ones : for example, three user stories about the "delete reservation" scenario.
Send these documents + your questions to the big model together : by designing a standard prompt, tell it: "This is a user story, this is a user problem, please recommend test risks based on this information."

At this time, the model's answer is no longer "imagination", but a more appropriate and contextual answer based on the understanding of the background document .

It's like you ask in a group, "How to test this module?" If no one reads the document, they will just say "test the boundary value", which is a cliché. But if someone reads the requirements first and tells you, "The logic here is to delete the child order first and then the parent order, which is easy to miss error handling," this is a valuable answer .

So why not send all documents to the big model?

Sounds great, right? Then why don't we just add all the documents at once? - No. We have to face a practical limitation: the context window .

What is a context window?

In simple terms, a large model is like an assistant that can only read 10 pages of a book at a time. You can't expect it to read your 500-page test document in one go and answer questions.

Taking Meta's LLaMA-2 as an example, its default context window is 4096 tokens , which is roughly equivalent to the content of 10 pages of a book. This "window" is the upper limit of the model's memory. If it exceeds this number, it will either truncate the subsequent information or directly report an error.

You may ask: "Isn't 10 pages enough?" For general searches, it is enough. But in testing work, a complete requirements analysis document can have dozens of pages; code analysis, exception use cases, interface definitions, etc. can also take up a lot of space. Not to mention the situation of link tracking or microservice dependency analysis.

Therefore, in RAG, we must select the key content to add . Just like when you take notes before an exam, it is impossible to copy down the entire book, only the most critical chapters. Similarly, we use search tools to determine which "data fragments" are most relevant and then send them to the model.

What does the structure of a prompt word look like? Here’s an example

Based on the previous logic, a RAG prompt word has the following structure:

You are an assistant responsible for test design. You need to output recommended test risk points based on the user story provided. This is the user story: {relevant_document} The user's question is: {user_input} Please list the recommended test risk points.

Through this prompt word design, the model not only knows what you are asking, but also knows who you are asking, which document you are asking about, and what type of content you want to output . This is much more reliable than vaguely asking "What do you think I should test?"

Key knowledge points that testers need to master

If you are a test engineer, especially a mid-level or senior tester who wants to use AI to assist in work, I suggest that you at least master the following key points:

Understand the limitations of the context window : You need to know how much a model can see at a time, otherwise it will always say “the prompt word is too long”.
Master the use of retrieval tools : for example, use vector databases (such as FAISS, Weaviate) to index documents to facilitate semantic search.
Be able to design structured prompts : clearly tell the model what role you want it to play, what content you want it to refer to, and what structure you want it to output.
Know how to "cut" : Faced with a pile of documents, you must learn to judge which ones are key and which ones can be ignored.
Pay attention to the evolution of context windows in the future : some new models (such as Claude 3 and GPT-4 Turbo) already support hundreds of thousands of tokens. This restriction will be gradually relaxed in the future, but it will not disappear completely.

RAG Structure Diagram

The RAG structure diagram is as follows:

RAG is not a high-end black technology, it is more like a way of thinking about using AI efficiently . Just like when you are facing an assistant, you will not throw all the questions to him/her in one sentence, but prepare the information first, and then ask him/her for advice .

Understanding the limitations of the context window, like knowing that the assistant can only look at a few pages at a time, is essential for working with models.