DeepRAG: Intelligent retrieval revolution in the LLM era (actually measured accuracy improvement of 21.99%)

Written by

Audrey Miles

Updated on:July-16th-2025

In the lab next door, PhD student Xiao Li was still staring at the screen in the middle of the night, with the model log scrolling frantically. His research object, the latest large language model (LLM), had just generated a confident but flawed answer. He smiled bitterly and closed the dialog box.

“That’s not right.”

He rubbed his temples and thought of the recently popular "RAG" technology - using external knowledge bases to enhance the accuracy of large models. Unfortunately, the existing solutions are too rigid in retrieval, and the information obtained is often redundant and even interferes with the original reasoning logic.

At this time, he accidentally clicked on a paper: DeepRAG: A New Paradigm for Retrieval-Enhanced Reasoning ^[1] (of course, I recommended it to him). This paper proposed a completely new approach, modeling retrieval-enhanced reasoning as a Markov decision process (MDP), which can dynamically decide whether to call external knowledge at each step, thereby optimizing retrieval efficiency and improving answer quality.

Xiao Li's heart was shocked - isn't this the answer he has been looking for?

The dilemma of traditional RAG: what should be searched cannot be found, and what should not be searched is searched desperately

Retrieval-Augmented Generation (RAG) has always been regarded as the key to solving the large model hallucination problem. However, in real applications, RAG often faces two core pain points:

1. Ineffective task decomposition and poor search quality

Existing RAG methods usually adopt the approach of "simple splitting + unified retrieval", that is, breaking the problem into several sub-problems, and then retrieving relevant documents for each sub-problem. However, this approach has a serious flaw:

• Unreasonable splitting : Some questions do not require additional information, but the system still searches blindly, which introduces interference.
• Lack of decision-making mechanism : Existing methods do not have the ability to intelligently judge under what circumstances retrieval is needed and how many pieces of content to retrieve.

2. Excessive search and high noise will reduce the accuracy

Many RAG systems default to "the more the better", which causes large models to have to filter answers from a large amount of irrelevant information, adding unnecessary noise. For example:

• You asked: “What are the latest Transformer improvements in 2024?”
• Existing RAGs may retrieve a large number of outdated papers and even some irrelevant basic tutorials, which will reduce the quality of answers.

This problem is essentially because the existing RAG lacks the ability to make "intelligent retrieval decisions" - and DeepRAG was created to solve this pain point.

DeepRAG: Retrieval-augmented Reasoning that Thinks Like a Human

The core idea of DeepRAG is simple - let the large model decide whether retrieval is needed at each step , just like a human, instead of mechanically calling an external knowledge base.

1. RAG’s decision engine: Introducing Markov decision process (MDP)

The biggest innovation of DeepRAG is that it models retrieval-enhanced reasoning as a Markov decision process (MDP) , allowing the system to make intelligent decisions on “retrieval” or “reasoning by memory” at each reasoning step:

• If the large model “knows” the answer, it can reason directly with parameterized knowledge.
• If the big model is “uncertain”, the search is triggered and the most relevant information is accurately selected.
• This decision is dynamic and not all questions are thrown to the retrieval system at the beginning.

This mechanism enables DeepRAG to control the retrieval process more accurately and reduce unnecessary noise.

2. Search step by step to avoid information pollution from "one-time search"

DeepRAG adopts an iterative retrieval approach instead of a one-time retrieval approach.

• The traditional RAG method retrieves all possible documents at once, resulting in information redundancy.
• DeepRAG will search in stages during the reasoning process to ensure that the content retrieved each time is necessary for the current reasoning.

This approach prevents the model from being disturbed by irrelevant information, thereby improving the accuracy of the final answer.

3. Balance between retrieval and reasoning: Let LLM decide whether to rely on memory or look up information

The biggest highlight of DeepRAG is that it allows LLM to decide whether to rely on "existing knowledge" to answer or "look for answers externally" , instead of letting RAG intervene by default.

• For example, when asked, “In what year was Einstein born?”, DeepRAG knows that this is the ground truth and no retrieval is required.
• But when the question involves the latest research progress , DeepRAG will automatically trigger a search and perform reasoning based on the latest information.

This mechanism significantly reduces retrieval redundancy, making RAG not only smarter but also more efficient.

Experimental results: DeepRAG improves accuracy by 21.99%

The experimental results of the paper show that DeepRAG outperforms traditional RAG on multiple benchmark datasets:

• Accuracy improved by 21.99% : DeepRAG reduces the interference caused by incorrect retrieval, making the final answer more accurate.
• Retrieval efficiency improved by 35.7% : Intelligent decision-making enables DeepRAG to call 35.7% fewer external knowledge bases than traditional RAG, but the final answer is more accurate.
• 40% reduction in noise : By using step-by-step retrieval, DeepRAG avoids interference from irrelevant information and makes the answer more focused.

This means that DeepRAG not only makes the answers of large models more accurate, but also makes the retrieval process lighter and has lower computational costs.

How to implement? 3 practical suggestions

If you want to use DeepRAG in your own project, you can refer to the following strategies:

1. Combine LangChain to build intelligent retrieval strategy

The concept of DeepRAG can be implemented using Adaptive Retrieval in LangChain to avoid blind retrieval.

2. Optimizing RAG Decisions Using Reinforcement Learning

DeepRAG's MDP framework can be combined with reinforcement learning (RL) to continuously optimize retrieval strategies in practical applications.

3. Design multiple rounds of interactions to improve reasoning accuracy

Combined with DeepRAG's step-by-step query idea, multiple rounds of interaction are designed to avoid returning redundant information at one time.

DeepRAG is not the end, but a new starting point for RAG

Many people think that the future of RAG is as simple as "connecting large models to databases". However, the emergence of DeepRAG tells us that the essence of intelligent retrieval is to let AI learn "when to search, what to search, and how much to search" .

DeepRAG is not an end, but a new starting point.