In-depth analysis of innovative RAG: PIKE-RAG and DeepRAG, innovative changes in RAG technology

Explore the innovative changes in RAG technology, how PIKE-RAG and DeepRAG will lead the future.
Core content:
1. The difference and advantages between PIKE-RAG and traditional RAG
2. The key components and functions of PIKE-RAG
3. The application of multi-layer heterogeneous graphs in knowledge base construction
In-depth analysis of innovative RAG: PIKE-RAG and DeepRAG, innovative changes in RAG technology
PIKE-RAG: From DIY furniture to interior design
Open source code: https://github.com/microsoft/PIKE-RAG
Traditional RAG (Retrieval-Augmented Generation) is like buying a DIY furniture kit. Although you get all the parts and assembly instructions, you have to assemble them yourself.
PIKE-RAG is completely different. It is like hiring a professional interior designer. This "designer" will first deeply understand your style preferences and actual needs, carefully select the most suitable elements for you, and then perfectly combine and match them to make the space both beautiful and practical.
As shown in Figure 1, PIKE-RAG consists of several key components: document parsing, knowledge extraction, knowledge storage, knowledge retrieval, knowledge organization, task decomposition and coordination, and knowledge-based reasoning. Each component can be customized according to the changing needs of the system.
Deep Insights
PIKE-RAG is a well-researched, detailed work, and it is open source.
The following points aroused my keen interest.
The first step is document parsing (AI Exploration Journey: PDF Parsing and Document Intelligence). Since documents often contain complex tables, charts, and graphics, the first step is to use layout analysis (Unveiling PDF Parsing 02: Pipeline-based Approach) to retain these multimodal elements. After that, the charts and graphics are processed with the help of visual language models (VLMs) to generate descriptions that are helpful for knowledge retrieval.
This approach has two significant advantages. First, the extracted layout information helps to segment the text, ensuring that the segmented text remains intact in the context. Second, compared to the end-to-end approach (Unveiling PDF Parsing 03: Small Model-Based Approach Without OCR, Unveiling PDF Parsing 04: Large Multimodal Model-Based Approach Without OCR), this divide-and-conquer strategy can achieve a higher degree of customization while taking advantage of the pipeline-based approach (Unveiling PDF Parsing 02: Pipeline-Based Approach).
The second step is to construct the knowledge base as a multi-layer heterogeneous graph, as shown in Figure 2, which mainly consists of the following layers:
Information resource layer : In this layer, various information sources are regarded as nodes, and the connections between nodes show their mutual reference relationship. Corpus layer : This layer organizes the parsed information into different parts and fragments while preserving the original hierarchical structure of the document. Multimodal content such as tables and graphics are summarized by large language models (LLMs) and integrated into fragment nodes. This layer can support knowledge extraction at different levels of granularity. Distilling the knowledge layer : This process uses techniques such as named entity recognition and relationship extraction to obtain key entities and their logical connections. By clearly capturing these relationships, it supports high-level reasoning processes.
In this architecture, nodes can represent various elements, such as documents, chapters, text fragments, charts, tables, and refined knowledge, while edges define the relationships between them.
Finally, PIKE-RAG divides problems into four types:
Factual questions : Extract clear, objective factual information directly from existing knowledge. Linkable reasoning questions : require reasoning and association across multiple pieces of knowledge to arrive at the correct answer. Predictive problems : Identify patterns in data and make inferences based on existing data to predict future outcomes. Creative Problem Solving: Combining knowledge and logic to generate new insights or innovative solutions.
To measure the system's ability to handle these different types of problems, PIKE-RAG defines four levels (L1, L2, L3, and L4), providing a clear roadmap for gradually improving RAG's capabilities.
As shown in FIG3 , different levels of RAGs are processed in different ways.
Fast delivery, intelligent routing: DeepRAG turns large language models into knowledge couriers
Image Interpretation
DeepRAG is like an experienced courier who doesn’t blindly deliver all orders. Instead, he plans the best route based on the delivery address (task decomposition). If he knows the route well (internal knowledge of the large language model), he will go directly to the destination. But if he encounters an unfamiliar street (knowledge loss), he will check the map (external search) to find the best path.
Comprehensive Overview
Traditional RAG has difficulties in task decomposition efficiency and retrieval redundancy.
As shown in the right side of Figure 4, DeepRAG's retrieval mechanism ensures that the entire retrieval process is well structured and adaptive. It generates subqueries based on previously retrieved information, and atomic decisions dynamically determine whether each subquery obtains external knowledge or relies only on parameterized knowledge.
DeepRAG is built by training a large language model to strengthen retrieval-based reasoning capabilities. As shown in Figure 5, the training process of DeepRAG mainly follows three key steps: (1) binary tree search, (2) imitation learning, and (3) calibration chain.
Given a data set, the training process first generates data for imitation learning through binary tree search to help the model learn retrieval patterns. Then, binary tree search is used again to create preference data to further optimize the large language model's understanding of its own knowledge boundaries.
In-depth analysis
Here are two really valuable takeaways from this research that I thought were worth sharing.
The first point is the classification of adaptive RAG methods, which can be mainly divided into three categories:
Classifier-based methods rely on training an additional classifier to make retrieval decisions. A typical example is Adaptive RAG, which we introduced before (Advanced RAG 11: Query Classification and Optimization). It trains a query complexity classifier, which is a relatively small language model. Confidence-based methods : rely on uncertainty indicators associated with thresholds. FLARE is an important example of this type of method. Large language model-based methods : The retrieval decision is generated by a large language model, but such methods often have difficulty accurately identifying their own knowledge boundaries, so it is not reliable to let the model decide when to retrieve. Self-RAG is a well-known example, which we have introduced before (Advanced RAG 08: Self-RAG).
It is worth mentioning that SEAKR (Unleashing AI’s Self-Awareness: How SEAKR Revolutionizes Knowledge Retrieval in Large Language Models), which we introduced earlier, uses the internal state of a large language model to determine when external knowledge support is needed (i.e., external information retrieval). SEAKR uses a more sophisticated processing method, making it a unique adaptive RAG method rather than just a method based on a large language model.
The second point is the enhancement of reasoning capabilities in RAG. The following are some representative methods:
Self-RAG (Advanced RAG 08: Self-RAG) and Auto-RAG improve the reasoning ability in the RAG framework through automatic data synthesis. Search-o1 integrates retrieval into the reasoning process and builds an intelligent agent system, but its application scope is currently limited to reasoning large language models such as OpenAI o1. AirRAG combines Monte Carlo Tree Search (MCTS) and self-consistency techniques to further enhance retrieval-based reasoning capabilities.
Different from the above methods that heavily rely on a large number of retrieval operations or specific large language models for reasoning, DeepRAG provides an end-to-end solution that enables any model to gradually perform retrieval-based reasoning operations according to actual needs.