How to improve the recall accuracy of RAG knowledge base documents?

Written by
Audrey Miles
Updated on:July-12th-2025
Recommendation

Master practical skills to improve the performance of RAG system.

Core content:
1. The impact of document segmentation granularity on recall accuracy
2. Post-retrieval sorting technology and its application
3. Hybrid retrieval strategy and methods to improve recall rate

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

In the RAG system, improving  the recall accuracy of knowledge base documents is crucial to improving the user experience of the entire system.

Today, I will   introduce in detail how to improve the recall accuracy of knowledge base documents from the aspects of document segmentation granularity , post-retrieval sorting , hybrid retrieval , and RAG-Fusion. I hope it will be helpful to you.

Document cutting granularity

In the RAG system, document segmentation is to  divide large documents  into smaller  text blocks for more efficient vector representation  and retrieval.  The size of the segmentation granularity  directly affects the efficiency and recall rate of retrieval.

If the document segmentation granularity  is too fine , contextual  lost , making the retrieved text blocks unable to accurately reflect the meaning of the original text, thereby reducing the recall rate.

However, if  the cutting granularity is too coarse , although the context information is retained, the retrieval range may be too large, increasing  noise and also affecting the recall rate.

Therefore , it is important to choose a moderate cutting size. However,  there is no standard answer for how to cut, and specific analysis is required.

For example, for  technical documentation  or  legal documents , you can segment them  by paragraphs  or chapters ; for news reports or blog posts, you can segment them by sentences  or  paragraphs  .

At present, there is another practice in the industry  , which is overlapping cutting , which is mainly to make the context more complete. If you are interested, you can learn more about it.

Sorting after retrieval

Typically, the initial retrieval results may contain a large number of documents that are relevant to the query but not actually highly relevant. Through  reranking technology , documents  with higher relevance  can be  placed at the front , thereby improving the recall accuracy.

For example,  statistical scoring-based re-ranking can be used to aggregate candidate result lists from multiple sources and use multi-way recall weighted scores or inverse ranking fusion algorithms to recalculate the scores for all results. This method is simple to calculate, low-cost and highly efficient, and is widely used in scenarios that are sensitive to latency.

In addition, you can  use models such as  BAAI/bge-reranker-v2-m3  based on deep learning reranking to better analyze the relevance between questions and documents. This method has higher retrieval accuracy, but the cost is also higher, and it is suitable for scenarios with high retrieval accuracy requirements.

Hybrid Search

A single  retrieval method may not be able to fully capture the user's query intent. Hybrid retrieval can significantly improve the recall rate by combining multiple retrieval methods.

Keyword matching  recall is simple and direct, and can quickly find documents containing specific keywords; vector matching  recall can better handle semantic matching and improve the comprehensiveness of recall.  Combining the two can take into account both the speed and accuracy of recall.

In addition, full-text search  can capture complete information in a document  , but the computational cost is high; vector search is computationally efficient, but may lose contextual information. By combining the two search methods, we can also make full use of their respective advantages and improve the recall rate.

RAG-Fusion

RAG-Fusion may not be well known to everyone. It is an  optimization method that combines multi-query recall  with  result fusion strategy. It improves recall rate and accuracy through multiple queries and result fusion.

First, use multiple different queries to obtain more comprehensive content. These queries can be generated based on variants, synonyms, near-synonyms or semantically related words of the original query . Of course, we can use  LLM  to generate them for us, that is, express the user's original question in different sentences and query them separately.

Weighted sorting is performed based on the importance of each query result  to ensure that the results are highly relevant and comprehensive.

Class Representative Summary

Improving the recall accuracy of knowledge base documents in the RAG system requires  multiple aspects . Through reasonable document segmentation granularity, post-retrieval sorting, hybrid retrieval and RAG-Fusion strategies, the recall rate can be significantly improved, providing users with more accurate and valuable answers. In actual applications, it is still necessary  to select appropriate strategies for optimization according to specific scenarios and needs .