How to recall data with high quality in RAG applications - Research on recall strategies

Written by
Jasper Cole
Updated on:June-22nd-2025
Recommendation

Explore the efficient strategy of data recall in RAG technology and reveal its application prospects in the field of artificial intelligence.

Core content:
1. RAG technology principles and the importance of data recall
2. Difficulties and challenges of RAG recall strategy
3. Detailed explanation of common recall strategies and application scenarios

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 Data recall is an important area of ​​RAG technology, and different recall strategies can even produce completely different results.



The core principle of RAG technology is very simple. In essence, it is to maintain a database externally. Before conducting large-model question and answer, relevant content is first found from the database and then input into the large model together.


However, due to the complexity of documents, it is difficult to truly achieve high-quality data processing when processing documents; therefore, various problems will be faced when recalling data.


Therefore, how to conduct high-quality data recall has become a topic that RAG must study; and today, we will briefly introduce several common recall strategies.





Recall Strategy




There are two main difficulties in RAG. One is the early document processing; the other is data recall. Since the large model itself cannot distinguish the quality of the documents input into the model, data recall can only be controlled manually, and the method of manual control can only be constrained through technical means.


Therefore, without considering the previous document processing, the most important point in RAG application is to solve the problem of data recall.


The essence of recall is actually very simple, which is to quickly and accurately find data related to the question from an external database; for example, the user’s question is how to learn artificial intelligence?


Then you need to quickly find content related to artificial intelligence from a large amount of external data, including books, videos, papers and other different forms.


For example, taking the existing knowledge system in the world, there are at least 800 fields involved, if not a thousand. It is impossible for a person to be involved in all fields and understand all fields. Therefore, if someone wants to quickly get started in a field, what should he do?


First of all, he can search the Internet for content information in any field he wants to work in; but the problem now is that there is so much information and documents in the world, how do search engines know how to find relevant data?


This is what search engines need to solve, and this is also what RAG needs to solve.




RAG is a semantic search based on a neural network model, so it is very different from traditional character matching search. The most intuitive manifestation is vector calculation; therefore, there is a special vector database based on the RAG system for vector search.


Of course, it does not mean that RAG can only use vector databases. The essence of RAG is to quickly find relevant data, but RAG does not care whether your data is stored in a vector database or a traditional relational database. In other words, RAG has nothing to do with data persistence, or data persistence is only a part of RAG.



Recall Strategy


There are many ways to implement RAG's recall strategy. The simplest one is based on traditional character matching and search technology, as well as the currently popular semantic retrieval method - vector calculation.


What is semantic search?


The so-called semantics means that you not only have to hear what I say, but you also have to understand what I mean.


For example, asking you if you have eaten yet may be just a greeting, or it may mean that he wants to treat you to a meal and chat with you. The meaning is different in different contexts.


RAG's main recall strategies are as follows:


  • Based on traditional character matching and word segmentation retrieval

  • Semantic retrieval based on vector calculation

  • Data rearrangement technology——Rerank

  • Problem Splitting Technique

  • Multi-way recall




Based on traditional character matching and word segmentation retrieval


Before the emergence of big models, search engines mainly used character matching and word segmentation technology; common technical carriers were relational databases and word segmentation retrieval tools such as ES.


In some business scenarios, RAG will still use these technologies because its technical system is relatively mature, the solutions are relatively complete, and the results are good.


Semantic retrieval based on vector calculation


Semantic retrieval based on vector calculations is usually done through vector databases or traditional relational databases that support vector calculations. Its essence is to convert text into vectors through an embedding model, and then calculate their similarity through methods such as Euclidean distance or cosine calculation.


Data rearrangement technology——Rerank


Data rearrangement is also a method based on vector calculation. The principle is to find the "score", that is, the data with the highest similarity, by re-arranging the results retrieved in the first step.


For example, if you search for Sun Wukong, you may get a lot of content related to him, such as content introducing the Four Great Classics, content introducing the Three Defeats of the White Bone Demon, or content about the havoc in heaven.


But maybe what you want to know is just "Havoc in Heaven" or "Three Fights with the White Bone Demon". In this case, there may be no need to introduce the contents of the four great classics.


By re-sorting, we can perform secondary screening of the recalled data and achieve more accurate data matching.




Problem Splitting Technique


The principle of question splitting is very simple. In essence, it is to use a large model to analyze your problem and then give several similar questions. Then, these similar questions are used to perform recall, so as to improve the accuracy of the recall data.


For example, the user’s question is: I want to travel, do you have any suggestions? 


At this time, the big model can help you split several similar problems based on this problem; for example, I want to go to a beautiful place; I want to find a place to relax; or I want to see the beautiful rivers and mountains of my motherland.


By splitting questions, users can get multiple similar questions, so that they can recall more relevant questions from more dimensions in the vector database or other places; then use re-ranking technology to find the most relevant content.


Multi-way recall


The principle of multi-way recall is also very simple, which is to retrieve multiple related contents through a variety of different strategies, models or channels; it is somewhat similar to the idea of ​​problem splitting, but the difference is that problem splitting starts from the problem, while multi-way recall starts from the retrieval strategy or retrieval approach.


To give a more vivid example, if you want to learn about a certain industry, you can choose to search for data on the public Internet, or you can choose to find a specialized industry forum or community to learn about the content; you can also learn by talking to professionals.


This method of recalling data through a variety of different means and channels is called multi-channel recall.




Of course, recall technology is not limited to the RAG field. In the traditional search engine field, recall technology also plays an important role; therefore, RAG technology can also be applied to the search engine field.


Of course, here we only introduce simple and common recall strategies. In the specific practice of RAG, there will also be some special recall methods and strategies in different scenarios, such as data classification, indexing, and new technologies such as knowledge graphs.


The most important thing is that you need to understand that these recall methods are not mutually exclusive; in many scenarios, these strategies are used in combination to achieve more accurate recall quality; especially in the case of large amounts of data, the method based entirely on vector similarity calculation has requirements for computing power and response, which is not allowed.


Therefore, in the case of large amounts of data, a common recall method is to first perform a fast inexact match and then perform a more accurate similarity calculation.