Just Dify the document? Get started with RAG development with just one article!

Written by
Clara Bennett
Updated on:July-08th-2025
Recommendation

In-depth interpretation of RAG technology, comprehensive analysis from basic to advanced applications.

Core content:
1. RAG technology concept and solution to LLM problems
2. RAG technology development and different types of introduction
3. RAG application in improving LLM accuracy and reliability

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
Table of contents

1 What is RAG

2 Development of RAG
3 Conclusion



RAG, the full name of Retrieval-Augmented Generation, is a common auxiliary method in current AI applications, which effectively improves the accuracy and reliability of LLM output. But some people always joke that RAG is as simple as "throwing documents into dify", is it true? There are already many high-quality articles on the Internet introducing the technical process of RAG, so the author wants to write this article from the perspective of RAG's technical development, from the most basic RAG to the currently popular Graph RAG and Agentic RAG, introducing the different types and differences of RAG, and I hope everyone can benefit from the article.






01



What is RAG


Before introducing RAG, let me first introduce two problems faced by large language models (LLMs). One is the knowledge cutoff of LLMs , and the other is the hallucination phenomenon of LLMs .


   1.1  Knowledge Deadline


LLM training is not real-time, but offline training. During the training process, the data used are prepared in advance, and most of them are public and open source data, which leads to the limited knowledge after LLM training. In other words, the model knowledge is limited to the knowledge covered by the training data. For new knowledge (such as today's news) or untrained knowledge (such as unpublished data), the model itself does not have this knowledge, but only has reasoning ability.


   1.2 Hallucinations


There are many explanations for the phenomenon of hallucination. On the one hand, LLM is a conditional probability model that generates text word by word with the probability of the vocabulary list as a condition of the previous text. This mechanism may lead to the generation of seemingly logically rigorous (high probability) but actually lacking factual basis, that is, "serious nonsense". On the other hand, the training process of LLM is a process of compressing and refining the knowledge of training data, but it is not lossless compression of knowledge. Marginal knowledge is easily distorted under the impact of mainstream knowledge, resulting in hallucinations.


To give an example, LLM is like a candidate who has been preparing for the exam for many years. When he is doing the test questions, when he encounters new subject knowledge points that he has never learned before, he has no idea where to start (knowledge cutoff) ; when he encounters knowledge points that he has not mastered firmly, he relies on vague memories to make up an answer, whether true or false (hallucination phenomenon) .


   1.3 RAG



RAG is a method that can effectively hallucinate model knowledge cutoffs and hallucination phenomena. RAG is the abbreviation of Retrieval-Augmented Generation. Retrieval-Augmented Generation refers to optimizing the input of a large language model (LLM) so that it can reference knowledge outside the training data source as the basis for the answer before generating a response. This is a cost-effective way to improve LLM output, keeping LLM relevant, accurate, and practical.


The RAG system has two main components:

  • Retrieval: Query external data sources, such as knowledge bases, vector databases, or web search APIs. Common retrieval methods include full-text retrieval, vector retrieval, and graph retrieval.

  • Generation: Provide the search information to LLM to generate answers.


For the big model candidate, RAG is like a reference book or a "second brain", allowing the model to look up reference materials in the book when encountering knowledge points that have not been learned or have not been learned well, thereby improving the accuracy of answering questions.



02



RAG Development


Judging from the development history of RAG in recent years, RAG has mainly experienced the development of Naive RAG, Advanced RAG, Modular RAG, Graph RAG, and the recently popular Agentic RAG.

  • Naive RAG  is the most basic implementation of RAG.

  • Advanced RAG  is based on Naive RAG and optimizes pre-search, search, and post-search.

  • Modular RAG  represents the engineering implementation of mainstream RAG.

  • Graph RAG  leverages graph retrieval capabilities to enable RAG to enhance multi-hop retrieval and enrich context.

  • Agentic RAG  uses Agent capabilities to enable RAG to have intelligent thinking and analysis, greatly enhancing its retrieval capabilities.


   2.1 Naive RAG


Naive RAG is the most basic implementation of the RAG system. It uses a single full-text search or vector search to retrieve documents related to the query from the document collection and directly uses the retrieved documents to enhance the generation of LLM.


Naive RAG has several limitations:

  • Lack of semantic understanding: Full-text matching relies on vocabulary matching and cannot capture the semantic relationship between query and document; vector retrieval is limited to indirect matching and lacks semantic understanding capabilities.

  • Poor output: Due to the lack of advanced pre-processing and post-processing of queries and documents, the recalled documents are prone to contain too much or too little information, resulting in the final generated answer being too broad.

  • Difficulty in optimizing results: The system relies too much on a single retrieval technology and does not enhance queries or documents, resulting in optimization being limited to retrieval technology.



   2.2 Advanced RAG


Advanced RAG improves the pre-retrieval, retrieval, and post-retrieval stages based on Naive RAG .


In the pre-retrieval stage, enhance document quality, such as optimizing chapter structure, enhancing titles, etc., and filter low-quality information; optimize index structure and chunk size so that the context granularity meets the needs of the application scenario; optimize index information, extract and enhance chunks as embedding text; and rewrite user queries.


In the retrieval stage, domain knowledge is used to fine-tune the embedding, or an LLM-based embedding model is used to generate semantic vectors that have a more accurate understanding of the context.


In the post-retrieval stage, adding reranking improves the relevance of retrieved documents, and adding context-compression makes the information provided to the model more focused.



   2.3 Modular RAG


Modular RAG is the current mainstream RAG system design, which decomposes retrieval and generation into independent and reusable components to achieve domain-specific optimization and task adaptability. Modular RAG modularizes all the various retrieval, storage, routing, etc. used by the RAG system, and can rearrange these modules according to specific scenarios, such as mixed retrieval of multiple retrieval methods, to achieve better results.



   2.4 Graph RAG


Graph RAG uses graph structures to extend traditional RAG systems, leveraging graph relationships and hierarchical structures to enhance multi-hop reasoning and context richness. Graph RAG can generate richer and more accurate results, especially for tasks that require relational understanding.


Graph RAG has the following limitations:

  • Dependence on high-quality graph data: High-quality graph data is critical to Graph RAG. It is sometimes difficult to process high-quality graph data, especially for unstructured plain text or data with poor annotation quality.

  • Application complexity: For a RAG system, supporting hybrid retrieval of unstructured data and graph data will increase the complexity of retrieval system design and implementation.



   2.5 Agentic RAG


Different from the previous static RAG, Agentic RAG uses LLM-based agents that can make dynamic decisions and tool calls to solve more complex, real-time, and multi-domain queries.


Thanks to the LLM-based tool calling capability, Agentic RAG can use more complex tools to assist in retrieval, such as search engines, calculators, and other tools that can be accessed in the form of APIs. In addition,  Agentic RAG can make dynamic decisions based on actual retrieval scenarios, such as deciding whether to perform a search, deciding which tool to use for the search, and evaluating the context of the search to decide whether to continue the search.





03



Summarize

RAGFeaturesadvantage
Naive RAG- Single index, such as TF-IDF, BM25, vector search- Simple and easy to implement - Alleviate model illusion
Advanced RAG- Document enhancement - Index optimization - Query rewriting - Reranking


- More accurate search - Enhanced search relevance
Modular RAG- Hybrid search - Tool and API integration - Modular and engineered implementation

- Greater flexibility - adapt to more diverse scenarios
Graph RAG- Graph structure index - Multi-hop reasoning - Contextual content enhancement based on graph nodes

- Relational reasoning capabilities - Suitable for structured data
Agentic RAG- Use LLM-based agents - Dynamic decision making and retrieval - Automatic process optimization

- Higher retrieval accuracy - Suitable for more complex and multi-domain tasks


Based on the development of RAG in recent years, there are several directions for the future development of RAG:

  • Intelligence: With the development of LLM applications, functions are becoming more and more complex, and the requirements for RAG will become higher and higher. Agentic RAG is the beginning of this direction. In the future, more intelligent RAG will become a "good partner" of LLM.

  • Data diversification: Graph RAG enables RAG to have graph retrieval capabilities, but how to integrate ordinary text, graph data, and other types of diversified data such as code, pictures, etc. into a unified RAG system for indexing, retrieval, and sorting? Future complex LLM applications will challenge this capability.


-End-
Original author|Zhang Wenjun