Combining inference models with rag search! Introducing 5 small projects that can be started right away

Combining the inference model with RAG search, 5 practical projects will help you break through the limitations of traditional retrieval and achieve a new level of intelligent search!
Core content:
1. Dynamic proxy retrieval mechanism: Use the inference model to optimize the RAG process and solve the problem of information omission
2. Think-while-search framework: Enable large-scale inference models to have real-time knowledge supplementation capabilities
3. Multimodal expansion: Introduce unstructured data processing such as images in the inference process
In the direction of combining reasoning models with rag search, we will introduce 5 small projects that are easy to get started with.
The first project explores how to introduce a thinking module into the rag process to alleviate the problem of rag misdetection and omission.
The second project explores how to expand the thinking chain process of the reasoning model to solve problems while thinking and searching.
The third project explores the use of reinforcement learning training methods to optimize the ability of reasoning model planning and reflection tools.
The fourth project will explore mature search tools based on the technologies in the second and third projects.
The fifth project will introduce multimodal data such as images into the reasoning process.
Introducing reflection into the search process
The traditional RAG (retrieval-augmented generation) model is relatively rigid. Usually, after completing the search, it uses similarity search to filter out some content, re-ranks it according to the degree of match, and selects seemingly reliable information fragments to pass to the large language model (LLM) to generate answers. This method is highly dependent on the performance of the re-ranking model. If the model performs poorly, it is very easy to miss key information, or even provide wrong information to the LLM, resulting in unsatisfactory generated answers.
The r1-reasoning-rag project takes a different approach, using the powerful reasoning ability of large reasoning models (LRM) to transform the original fixed screening process into a flexible and dynamic mechanism, which they call "proxy retrieval." Under this mechanism, AI seems to have autonomous consciousness, not only can it actively explore missing information, but also can continuously optimize its own strategy during the retrieval process, forming a virtuous circle, thereby providing more accurate content for LLM.
For example, when asked the complex question "Who is Nezha's master in The Investiture of the Gods", the project will not blindly search like traditional RAG. First, based on the reasoning ability of DeepSeek-R1, it will analyze the problem and understand that it is necessary to first identify Nezha's master, and then further explore his master's master.
Next, the information "Nezha's master is Taiyi Zhenren" is retrieved from the knowledge base. At this point, the agent is not satisfied with this, but realizes through the reflection module that the key information "Taiyi Zhenren's master (that is, Nezha's master)" is still missing.
So, it automatically searches again until it gets the complete and accurate answer - Yuanshi Tianzun. In this process, the intelligent agent constantly reflects and adjusts the search strategy based on the existing information to ensure the completeness and accuracy of the final answer.
Project address: https://github.com/deansaco/r1-reasoning-rag
Search-o1 uses reasoning models to think and search
Large Reasoning Models (LRMs), such as OpenAI-o1, have demonstrated impressive capabilities in long-step reasoning. However, as the reasoning process becomes longer, the problem of insufficient knowledge reserves gradually becomes prominent, frequently causing uncertainty and even leading to errors.
Search-o1 innovatively proposes a framework that for the first time tightly integrates the agent retrieval augmentation generation (RAG) mechanism with the reasoning process of large reasoning models (LRMs), giving the model a powerful ability to autonomously supplement knowledge. During the reasoning process, when LRMs encounter uncertain knowledge points, Search-o1 can automatically trigger retrieval operations, obtain relevant information from external knowledge sources, and expand the model's thinking chain.
At the same time, considering that the retrieved documents are often lengthy and may contain a lot of redundant information, the project specially designed the Reason-in-Documents module . This module can deeply analyze the retrieved documents, simplify and extract key information, and cleverly inject it into the reasoning chain to ensure that the coherence of the reasoning process is not affected.
The experimental results of the paper are very good, but I suggest that you mainly learn the idea of using reasoning models to think and search at the same time .
Project address: https://github.com/sunnynexus/Search-o1
Search-R1 framework introduces reinforcement learning to enhance reasoning capabilities
The first two projects have achieved remarkable results in combining reasoning and search, but neither of them introduced a training mechanism . On this basis, the Search-R1 project made bold innovations and introduced reinforcement learning technology, injecting new vitality into the development of intelligent agents.
Through reinforcement learning, Search-R1 enables the agent to continuously learn and grow in the process of continuous interaction with the environment. Every time the agent performs an action, it will judge the quality of the action based on the reward signal fed back by the environment. If the action brings positive results, the agent will receive positive rewards, and its tendency to perform the action will increase; conversely, if the action leads to adverse consequences, the agent will receive negative rewards, and the probability of performing the action will decrease. Under such a mechanism, the agent gradually learns how to reflect on its own behavior more efficiently and optimize the strategy for calling tools.
Project address: https://github.com/PeterGriffinJin/Search-R1
WebThinker Project
WebThinker cleverly integrates the relevant ideas of Search-o1 and Search-R1, successfully breaking through the limitations of remaining only in the theoretical research stage and initially realizing its application in search.
When you ask AI a questionWhat models does OpenAI have and what are the differences between them
, AI can not only give answers, but also search for the latest information, browse web content in depth, and generate a well-structured research report. In the official demonstration, WebThinker can start the closed-loop process of "thinking-searching-writing" only through the user's question input, and finally output a professional report containing model technical details, application scenario comparison, and development process analysis, without human intervention in the whole process.
Different from traditional single search, WebThinker has developed an intelligent web page interaction module:
Support page jump by clicking on interactive elements such as links and buttons Determine whether further search or in-depth browsing is needed based on the current information Automatically extract key information and integrate it into the reasoning process
This "thinking search" mode enables the model to start from the initial search results, gradually dig into deeper information, and form a complete knowledge graph, just like a human researcher.
In addition, during the search and mining process, a large-scale inference trajectory dataset is constructed, and reinforcement learning is used to optimize model performance.
The model is now open source and has four versions, which can be searched on huggingface:
WebThinker-QwQ-32B WebThinker-R1-7B WebThinker-R1-14B WebThinker-R1-32B
Project address: https://github.com/RUC-NLPIR/WebThinker
VRAG-RL: Opening a new chapter for the next generation of RAG with multimodal reinforcement learning
The above methods all introduce semantic data during the thinking process, while VRAG enables the reasoning model to dynamically introduce data from multiple modalities such as images during thinking . This means that when the reasoning model processes problems, it can make comprehensive use of rich information from different modalities, thereby understanding the problem more comprehensively and deeply, and generating more accurate and rich answers.
The VRAG-RL demonstration scene is shown above. This picture shows the process of human-machine dialogue to solve specific problems in the context of India's "Energy Divide". :
User Question: In the map of India's 'Energy Divide' compiled by IWMI (International Water Management Institute) based on the 'Minor Irrigation Census', how many more Dugwells are there compared to Shallow Tubewells?
Solution process :
Step 1: Understand the problem and search for relevant information
Thoughts: To answer this question, one needs to find specific data related to water wells and shallow tube wells in India’s “energy divide” map. Search for: "India Energy Divide IWMI Small Scale Irrigation Survey Water Wells Shallow Tube Wells"
Step 2: Image retrieval and preliminary analysis
Image Retrieval: A map showing India's "energy divide" was retrieved. Thoughts: This map shows the proportion of power structures with different color codes, but does not provide the specific number of water wells and shallow tube wells. Therefore, further search for more detailed data is needed.
Step 3: Search for detailed data
Search for: "Detailed data of water wells and shallow tube wells" Image Retrieval: A graph containing the number of various irrigation structures was obtained.
Step 4: Focus on relevant data
Think about it: The table in the image contains the number of various structures including water wells and shallow tube wells. The image needs to be cropped to focus on the relevant data. Region selection: [361, 366, 721, 517] (this is the coordinate range of the cropped image)
Step 5: Crop, Select and Zoom
Action: Crop, select, and zoom in on images to clearly see relevant data.
Step 6: Analyze specific data and get answers
Think about it: The cropped image provides the exact number of water wells (9.20 million) and shallow tube wells (9.10 million). To find out how many more water wells there are than shallow tube wells, we need to subtract the number of shallow tube wells from the number of water wells. Answer: There are 0.10 million (or 100,000) more water wells than shallow tube wells.