Subverting traditional RAG and innovating large model retrieval enhancement—Insight-RAG

Written by

Caleb Hayes

Updated on:June-26th-2025

RAG has become the title of the big model, but traditional methods have drawbacks such as insufficient retrieval depth and difficulty in integrating multi-source information. For example, traditional RAG relies on surface relevance to retrieve documents, which easily ignores the information buried deep within a single document. In legal agreements, subtle contract terms are ignored; in business reports, hidden data trends are missed.

Therefore, researchers at Megagon Lab proposed an innovative framework , Insight-RAG , to better capture task-specific subtle information and integrate higher quality data.

Insight Identifier is the first step of the Insight-RAG framework, and its core task is to extract key information requirements from the input query. It identifies the core information required to complete the task by analyzing the input query and task context.

For example, if the task is to answer a question about a specific scientific concept, the insight identifier extracts the key entities and relations involved in the question and converts them into an “ insight ” that can be understood by subsequent modules .

The insight recognizer converts the input query into a sentence fragment, which is an unfinished sentence that needs to be completed by the subsequent module. For example, for the question "Where is Person X born? " the insight recognizer will extract a sentence fragment such as "Person X was born in" . This format not only simplifies the expression of the question, but also provides a clear retrieval direction for the subsequent modules.

In addition, the insight recognizer will determine whether the answer to the question is multiple. For example, if the question is " What cities are there in California? " Since the question uses plural nouns, the answer should be multiple cities. This judgment will serve as the basis for subsequent modules to process the question.

The insight miner is the second step of the Insight-RAG framework. Its task is to retrieve specific content that is highly relevant to the sentence fragments extracted by the insight recognizer from the document database. The core of this module is a specially trained large language model that learns how to extract task-related insights from a large number of documents through continuous pre-training.

The researchers used the LLaMA-3.2 3B model as the insight miner and continuously pre-trained it.

During the pre-training process, the model not only learns the content of the original document, but also learns the triple information extracted from the document. This dual training method enables the model to better understand the semantic relationship in the document and retrieve specific content that is highly relevant to the input sentence fragment.

The response generator is the last step in the Insight-RAG framework, whose task is to combine the original query with the specific content retrieved by the insight miner to generate a contextually rich and accurate response. The core of this module is a fine-tuned large language model that generates a complete response by integrating the original query and the retrieved insights.

For example, if the original question is “ Where was Person X born? ” and the sentence fragment extracted by the insight recognizer is “Person X was born in” and the specific content retrieved by the insight miner is “New York” , then the response generator will generate a complete answer: “Person X was born in New York. ”

The design of the response generator enables the Insight-RAG framework to generate high-quality answers, not only considering the surface relevance in the document, but also digging deep into the semantic information in the document. This design makes the Insight-RAG framework perform better when handling complex tasks.

To evaluate the performance of the Insight-RAG framework, the researchers conducted comprehensive tests using two scientific paper datasets: AAN and OC .

The results show that the Insight-RAG framework performs well in deep information retrieval tasks. Compared with traditional RAG methods, Insight-RAG can significantly improve the accuracy in most cases. For example, in the AAN dataset, Insight-RAG 's accuracy is about 60% higher than that of traditional RAG methods .

The Insight-RAG framework also performs very well in multi-source information aggregation tasks. Compared with traditional RAG methods, Insight-RAG can more effectively aggregate information from multiple sources, thereby improving the performance of the model. For example, in the OC dataset, the accuracy of Insight-RAG is about 50% higher than that of the traditional RAG method .