DeepRAG: Intelligent retrieval + MDP, accurately kills factual illusions, and the accuracy rate soars by 21.99%!

Written by

Jasper Cole

Updated on:July-15th-2025

❝
LLMs suffer from the problem of fact hallucination during reasoning, especially in terms of temporality, accuracy, and coverage of parameter knowledge. Moreover, combining reasoning with RAG remains challenging, mainly due to ineffective task decomposition and redundant retrieval, which may introduce noise and degrade response quality. DeepRAG, which is a method that models retrieval-enhanced reasoning asMarkov Decision Process（Markov Decision Process, MDP) framework, enabling strategic and adaptive retrieval.Iterative decompositionQuery, DeepRAG Dynamically determineAt each step, it is necessary to retrieve external knowledge or rely on parameterized reasoning. Experiments show that DeepRAG improves the accuracy of answers by 21.99% while improving retrieval efficiency, proving its effectiveness in optimizing retrieval-enhanced reasoning.

DeepRAG Core Methods

DeepRAG is used to solve the fact hallucination problem of LLMs in the reasoning process and proposes a new framework to enhance the reasoning ability in retrieval-enhanced generation. Retrieval-enhanced reasoning is modeled as a Markov decision process (MDP), which dynamically decides whether to retrieve external knowledge or rely on parameterized reasoning at each step by iteratively decomposing the query.

Markov decision process (MDP) modeling : First, the problem decomposition, atomic decision and final answer generation process are formalized as an MDP, which is defined as follows: (S, A, P, R),Represents a state set,Represents a set of actions,Indicates the transfer dynamics,represents the reward function.represents a partial solution to the original problem, actionIncludes termination decisions and atomic decisions.

Binary Tree Search : To construct the reasoning path, a binary tree search method is implemented, exploring different answer strategies based on parameter knowledge or external knowledge base. Two answer strategies are generated for each subquery: directly utilizing parameter knowledge or retrieving external documents.
Imitation learning : Through binary tree search, synthetic data is extracted to perform imitation learning on the reasoning process with the minimum retrieval cost. The specific algorithm is as follows:

fromTake the trajectory with the lowest number of searches.
Generate subqueries.
If you should answerOr the length exceeds the maximum history length, then the final answer is generated.
Otherwise, generate the next subquery, and add it to the priority queue.
If you retrieve a document, then generate an intermediate answer, and add it to the priority queue.

Initialize the priority queue.
whenWhen not empty, perform the following steps:

Calibration chain : Optimize the internal knowledge of LLM by calibrating each atomic decision. The specific steps include:
in,is a logical function,is to adjust for deviations from the baseline modelThe penalty hyperparameter isandRepresent the generated fragments of direct answers and retrieved answers respectively.

Synthesize preference data to determine when retrieval is needed.
The LLM is fine-tuned using the calibration chain objective, as follows:

Experimental design

Datasets : Five open-domain question-answering datasets are used for experiments. The training datasets are from HotpotQA and 2WikiMultihopQA, and the testing datasets include HotpotQA, 2WikiMultihopQA, CAG, PopQA, and WebQuestions.
Baseline methods : CoT, CoT , CoT-Retrieve, CoT-Retrieve , IterDRAG, UAR, FLARE, DRAGIN, TAARE, and AutoRAG are used as baseline methods for comparison.
Implementation details : BM25 is used as the retrieval model, the external knowledge base is Wikipedia, and it is divided into paragraphs of 100 tokens. Llama-3-8B-Instruct and Qwen-2.5-7B are selected as the basic models.

Results and Analysis

Overall results : DeepRAG outperforms existing methods in all tested scenarios. Compared with the baseline methods based on reasoning and adaptive RAG, DeepRAG achieves improvements on all datasets.
Retrieval efficiency : DeepRAG has a relatively low retrieval cost while maintaining high accuracy. Confidence methods have limited robustness on different datasets, and iterative retrieval methods require a large number of retrieval operations.
Correlation with parameter knowledge : DeepRAG performs well in F1, balanced accuracy, and MCC metrics, successfully identifying the necessity of retrieval. FLARE, DRAGIN, and TAARE perform poorly in avoiding unnecessary retrieval despite their high accuracy.
Different reasoning strategies : relying solely on internal knowledge has poor results, while relying entirely on external knowledge has high accuracy but high retrieval costs. DeepRAG outperforms methods that rely solely on retrieval by adaptively selecting internal and external knowledge sources.
Problem decomposition effect : Most problems require 3 to 5 steps of decomposition, and retrieval attempts are mainly concentrated in 0 to 2 rounds. DeepRAG effectively decomposes the problem while minimizing redundant retrieval.
Ablation study : In the imitation learning stage, DeepRAG-Imi performs worse on CAG, but has higher average performance. In the calibration chain stage, DeepRAG has higher average performance while maintaining a low retrieval cost.

Differences between DeepRAG and existing RAG

Dynamic and strategic retrieval :

DeepRAG models the retrieval process as a Markov decision process (MDP) that dynamically and strategically decides when to perform retrieval. It iteratively decomposes the query and dynamically decides at each step whether to retrieve external knowledge or rely only on internal parameterized knowledge.

Binary Tree Search :

DeepRAG uses binary tree search to explore different retrieval strategy paths. This approach allows the model to generate subqueries and dynamically adjust the retrieval strategy based on the information retrieved, thereby more effectively utilizing external knowledge.

Knowledge Boundary Calibration :

DeepRAG optimizes the model's knowledge boundary perception ability through a chain calibration method. It trains the model with synthetic data and preference data, enabling it to more accurately judge when external information needs to be retrieved, thereby reducing unnecessary retrieval operations.

Reduce redundant searches :

DeepRAG reduces unnecessary retrieval operations by dynamically determining retrieval requirements, thereby improving retrieval efficiency and reducing the risk of degradation in generation quality.

End-to-end training :

DeepRAG uses an end-to-end training approach and is able to leverage the generative power of large language models to explore knowledge boundaries without relying on additional parameters or unreliable uncertainty measures.

Multi-step reasoning :

DeepRAG implements a multi-step reasoning process by decomposing complex queries into multiple sub-queries and dynamically deciding whether to use parameterized knowledge or external knowledge at each step. This approach helps to handle tasks that require multi-step reasoning.

These features make DeepRAG excel in improving retrieval efficiency and answer accuracy, especially when dealing with question-answering tasks that require multi-step reasoning and are time-sensitive.

Summarize

This paper proposes DeepRAG, which enhances LLMs' understanding of retrieval needs through self-calibration. DeepRAG decomposes queries into subqueries and uses binary tree search for data synthesis to help the model better understand its knowledge boundaries. Experimental results show that DeepRAG significantly improves the accuracy and efficiency of retrieval enhancement generation.

Shortcomings and reflections

Limitations : Although DeepRAG performs well in most cases, it still falls short on some datasets, especially on the time-sensitive CAG dataset, where DeepRAG performs worse than some adaptive retrieval methods.
Next steps : Future research can further optimize DeepRAG's retrieval strategy, especially when dealing with time-sensitive and multi-hop factual question answering tasks, to further improve the robustness and accuracy of the model.

Questions and Answers

Question 1: How does DeepRAG construct the reasoning path through the binary tree search method?

DeepRAG constructs the reasoning path through the binary tree search method. The specific steps are as follows:

Subquery Generation : For a given question, the model first generates the first subquery and explores two answering strategies: directly leveraging parameter knowledge or retrieving external documents.
Path exploration : The model generates subqueries based on the current state and chooses to perform direct answers or retrieve external documents based on pre-set thresholds or strategies.
Recursive decomposition : For each generated subquery, the model continues to recursively do the same operation until a termination condition is reached (such as generating the final answer or reaching the maximum number of iterations).
Data Synthesis : In this way, DeepRAG not only decomposes the question into a series of forward-dependent sub-queries, but also thoroughly examines the impact of retrieval selection in the final answer. This approach not only improves the coherence of reasoning, but also ensures the effectiveness of retrieval.

Question 2: How does DeepRAG’s imitation learning phase use binary tree search to synthesize data?

In the imitation learning stage, the specific steps of DeepRAG using binary tree to search for synthetic data are as follows:

Priority queue initialization : Use a priority queue to efficiently explore potential reasoning trajectories. The elements in the priority queue are sorted by retrieval cost.
Path construction and evaluation : The model takes the current path from the priority queue, generates the next subquery, and chooses to perform a direct answer or retrieve external documents based on a pre-set threshold or strategy. Then, the newly generated path and retrieval results are added to the priority queue.
Data collection : Repeat the above process until the priority queue is empty or a path that generates the correct answer is found. The path collected finally contains the reasoning process with the minimum retrieval cost.
Model fine-tuning : Use the collected synthetic data to fine-tune the model and optimize the model's termination and atomic decisions, so that it can be more effectively retrieved and reasoned during the actual generation process