8 FAQs and Solutions for Implementing the RAG System

Written by
Silas Grey
Updated on:June-09th-2025
Recommendation

In-depth analysis of the challenges and response strategies in the implementation of RAG systems. Core content: 1. Overview of 8 FAQs when RAG system is implemented 2. Importance and optimization plan for knowledge base data processing 3. The responsibilities and misunderstandings of large models in RAG systems

 
Yang Fangxian
53A founder/Tencent Cloud (TVP) most valuable expert

 

I have popular science at zero-basis level, you can click here to view.

This article focuses on specific problems and solutions to the implementation process.

There are 8 common questions in total:

  1. 1. Do not answer according to the answer of "Knowledge Base" and make your own decision
  2. 2. The same question cannot guarantee that each answer is accurate, and it is even wrong many times
  3. 3. The answer is incomplete, and there is clearly a complete answer in the "Knowledge Base"
  4. 4. Correct answer and wrong answer coexist
  5. 5. Fan Fan's answer, no details / comprehensive, no matter how big or small
  6. 6. The answer does not display the accompanying picture
  7. 7. The response is very slow, and even crashes directly
  8. 8. Answering questions one-sidedly, not to "select" appropriate support materials correctly

At the beginning

To build RAG products/systems, 90% of the effort should be spent on the processing of knowledge base data.

"A clever woman can't cook without rice."

In the RAG system, the AI ​​model only plays its value in the last step. This "smart woman" cannot make the "rotten leaves" into a supper that can fill her stomach.

For the 7 common questions in the previous ones, only 1 and 5 are responsible for the "Qiaowo", and the other 5 are all because we feed her rotten leaves.

1 Answer the question by yourself

The root cause of this problem is that you don't understand the nature of RAG.

In the RAG system, the large language model is only responsible for two things:

  1. 1. Determine whether this question can be answered
  2. 2. Answer Edit

The reason why "self-taking" is that it does not clarify the responsibilities of the large language model

Since we have chosen RAG, it means we no longer believe in the answers generated by the large language model

If you don't believe it, just stick to it: don't distrust it while entrusting important tasks.

The standard description structure of roles and tasks in the RAG system:

Role description: A character without subjective initiative, such as an assistant
Task scenario: Receive problem and support information, edit output text

Any behavior that attempts to use the following "prayer" prompt word to constrain the large language model is a manifestation of an amateur:

  1. 1. Don't make up
  2. 2. Don't generate non-existent answers
  3. 3. Make sure your answer is accurate (respect the facts)
  4. 4. …

2Answer unstable

There are two core reasons:

  1. 1. There is a problem with the problem itself, and the correct answer cannot be retrieved stably
  2. 2. There is a problem with the data and sorting, and the correct answer cannot be recalled stably

Yes, "User's Problem" has a problem that is not a user's problem, it is yours.

We cannot require users to be professional users, because professional users are likely not to use our products...

Most users will not start from "knowledge materials" like we do when asking questions, and use a complete and comprehensive description.

Most of the time, their problems will be similar to the following:

  1. 1. Are there any product description information?
  2. 2. The login button click did not respond
  3. 3. Can Apple be?

LightRAG It is not unreasonable to attract 17K Star with query optimization. It is strongly recommended to take a look at their engineering optimizations.

Suppose you have clarified the responsibilities of the large language model (edited), then the key factor that affects its answer is what reference materials we give it.

Select what support data to the large language model, and rely on two factors:

  1. 1. Can information that can answer user questions be retrieved
  2. 2. Is the information in front of it and is retrieved

Because the various forms of data are strange, there is currently no plan to ensure that the data is 100% retrieved.

But the following three solutions can be used to optimize:

  1. 1. Effective preprocessing of data to ensure reasonable segmentation (at least the complete answer has not been chopped)
  2. 2. Perform secondary processing of data, such as extracting keywords + extracting possible problems
  3. 3. Choose a high-latitude embedding model to enhance semantic recognition capabilities

About the answer sorting question (factor 2), blindly spending money to buy expensive Rerank is not the only solution.

Spend more effort to adjust the mixed weight and Score thresholds, and take a closer look at the characteristics of the recall answer to obtain more effective results.

3Incomplete answer

There are basically only one reason for this problem: the document segmentation is unreasonable, and the complete answer is cut open.

Only one period was recalled.

There is only one solution, look at the recalled paragraph and re-partition.

Don't be lazy to use custom segments, or simply choose to segment by characters.

One need to pay attention to: the "by identifier" segmentation and the "maximum segment length" in the knowledge base segment are effective at the same time.

More notable is that Those incomplete answers are the main sources of large language model hallucinations!

4 contains error answer

Two reasons:

  1. 1. The recall paragraph contains irrelevant answers, which leads to the hallucination of the large language model
  2. 2. The prompt to generate answers for large language model is not enough

Top K in the recall strategy It is not that the bigger the better, without the similarity threshold constraint, the more paragraphs are recalled, the more irrelevant answers there will be.

If you don't have a proper solution in this link, you can only work hard in the final "guaranteed" prompt: tell the large language model how to determine which answers are valid and how to eliminate irrelevant paragraphs.

Usually, I will add this sentence to the prompt word for generating the final answer:

See the correlation between supporting data and user problems. Some data may be incorrectly placed, but they cannot answer users' questions. You can choose not to use them. 

5No details/all details

This is a response format question, which is the same as "can't get the RAG system to generate answers in the specified format".

Essentially, you didn't explain the last word of the bottom-up prompt, or Failed .

There are only two solutions to this problem:

  1. 1. Give an example of the answer instead of the description requirement
  2. 2. Put the constraint at the end of the prompt word, user prompt is worth putting it again

6No pictures

All knowledge materials should be converted into Markdown before being segmented.

Word, PDF are for people to see.

What you see and what the large language model finally receives will be completely different from what you see.

Especially related content

First make sure you have a deep understanding of the basic principles of RAG, and then think about why you don't display the picture, otherwise the following solution will not really solve the problem.

Tell the large language model to correctly display the Markdown or tagged pictures. It is best to add comments to the pictures to facilitate model selection.

7Slow response

The first token response speed of the large model, in addition to being affected by the quality of the model itself, the context length is another very important factor.

Even if you feel sorry for the token fee, you must consider the response speed and process the support data in segments.

2000 tokens can be used as the upper limit of segment length. No matter how large the token responds, the response time of the first token will exceed 1 second. If you cannot use streaming output, the overall response time may exceed 10 seconds.

By the way, instruct the mature engineers in the team to leave more timeout for the API response timeout...

8 Not System

The biggest problem brought by segmentation is the fragmentation of knowledge.

The impact of "fragmented knowledge" on the RAG system is mainly in terms of the comprehensiveness of the recall answers. Questions cannot be answered directly, but as background information or related information is almost never retrieved.

There are two popular solutions currently:

  1. 1. Knowledge graph enhancement
  2. 2. Agentic Enhancement

I am more optimistic about the strategy of image enhancement at this moment. Agentic will involve more engineering-side optimization and prompt word discipline (mainly, the domestic model Agentic cannot do it at this moment).

Recommended to learn Microsoft's GraphRAG project