How to improve the recall accuracy of RAG knowledge base documents?

Master practical skills to improve the performance of RAG system.
Core content:
1. The impact of document segmentation granularity on recall accuracy
2. Post-retrieval sorting technology and its application
3. Hybrid retrieval strategy and methods to improve recall rate
In the RAG system, improving the recall accuracy of knowledge base documents is crucial to improving the user experience of the entire system.
Today, I will introduce in detail how to improve the recall accuracy of knowledge base documents from the aspects of document segmentation granularity , post-retrieval sorting , hybrid retrieval , and RAG-Fusion. I hope it will be helpful to you.
Document cutting granularity
In the RAG system, document segmentation is to divide large documents into smaller text blocks for more efficient vector representation and retrieval. The size of the segmentation granularity directly affects the efficiency and recall rate of retrieval.
If the document segmentation granularity is too fine , contextual lost , making the retrieved text blocks unable to accurately reflect the meaning of the original text, thereby reducing the recall rate.
However, if the cutting granularity is too coarse , although the context information is retained, the retrieval range may be too large, increasing noise and also affecting the recall rate.
Therefore , it is important to choose a moderate cutting size. However, there is no standard answer for how to cut, and specific analysis is required.
For example, for technical documentation or legal documents , you can segment them by paragraphs or chapters ; for news reports or blog posts, you can segment them by sentences or paragraphs .
At present, there is another practice in the industry , which is overlapping cutting , which is mainly to make the context more complete. If you are interested, you can learn more about it.
Sorting after retrieval
Typically, the initial retrieval results may contain a large number of documents that are relevant to the query but not actually highly relevant. Through reranking technology , documents with higher relevance can be placed at the front , thereby improving the recall accuracy.
For example, statistical scoring-based re-ranking can be used to aggregate candidate result lists from multiple sources and use multi-way recall weighted scores or inverse ranking fusion algorithms to recalculate the scores for all results. This method is simple to calculate, low-cost and highly efficient, and is widely used in scenarios that are sensitive to latency.
In addition, you can use models such as BAAI/bge-reranker-v2-m3 based on deep learning reranking to better analyze the relevance between questions and documents. This method has higher retrieval accuracy, but the cost is also higher, and it is suitable for scenarios with high retrieval accuracy requirements.
Hybrid Search
A single retrieval method may not be able to fully capture the user's query intent. Hybrid retrieval can significantly improve the recall rate by combining multiple retrieval methods.
Keyword matching recall is simple and direct, and can quickly find documents containing specific keywords; vector matching recall can better handle semantic matching and improve the comprehensiveness of recall. Combining the two can take into account both the speed and accuracy of recall.
In addition, full-text search can capture complete information in a document , but the computational cost is high; vector search is computationally efficient, but may lose contextual information. By combining the two search methods, we can also make full use of their respective advantages and improve the recall rate.
RAG-Fusion
RAG-Fusion may not be well known to everyone. It is an optimization method that combines multi-query recall with result fusion strategy. It improves recall rate and accuracy through multiple queries and result fusion.
First, use multiple different queries to obtain more comprehensive content. These queries can be generated based on variants, synonyms, near-synonyms or semantically related words of the original query . Of course, we can use LLM to generate them for us, that is, express the user's original question in different sentences and query them separately.
Weighted sorting is performed based on the importance of each query result to ensure that the results are highly relevant and comprehensive.
Class Representative Summary
Improving the recall accuracy of knowledge base documents in the RAG system requires multiple aspects . Through reasonable document segmentation granularity, post-retrieval sorting, hybrid retrieval and RAG-Fusion strategies, the recall rate can be significantly improved, providing users with more accurate and valuable answers. In actual applications, it is still necessary to select appropriate strategies for optimization according to specific scenarios and needs .