Three major paradigms and technological evolution of RAG technology

Written by
Silas Grey
Updated on:June-19th-2025
Recommendation

In-depth analysis of how RAG technology promotes the development of large models and solves core problems such as information illusion.

Core content:
1. Three paradigms of RAG technology: basic RAG, advanced RAG, modular RAG
2. RAG core components: retrieval optimization, generation, and enhancement technology
3. RAG evaluation framework, future research directions, and complementarity with large model fine-tuning

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

In the previous article, we introduced the core technology in the field of large models: RAG (Retrieval-Augmented Generation), which is retrieval enhanced generation .

The technical principle of the big model is to continuously predict the next token, and each generated token will affect the generation of the next token. In simple terms, the big model is a probability prediction machine, and the same prompt will produce different answers, which is the so-called information illusion problem.

To put it in a more understandable way, for the same prompt or the same group of prompts, the results of the large model are not idempotent.

Since ChatGPT came out at the end of 2022, problems such as information hallucination (generating wrong information), outdated knowledge, and opaque reasoning process of large models have been plaguing technical personnel in the industry. It was not until the emergence of RAG technology that this problem was alleviated.

On December 18, 2023, several well-known scholars jointly published a review article on RAG technology, " Retrieval-Augmented Generation for Large Language Models: A Survey " , which discussed in depth the three paradigm evolutions of RAG (basic RAG, advanced RAG, modular RAG) and the key technologies of the three core components of retrieval, generation and enhancement. The article also proposed the evaluation framework and future research directions of RAG, as well as the complementarity of RAG with large model fine-tuning, cue word engineering, and the potential for multimodal capability expansion.

Original link: https://arxiv.org/pdf/2312.10997

If you can't access it, you can click this link: https://metaso.cn/s/LlcV6lu

The following is the core content of this article that I translated and summarized, for reference only.



1. RAG’s Three Technical Paradigms

1. Naive RAG 

  • Process: Indexing → Retrieval → Generation.  
  • Example: In open-domain question answering, the user question is vectorized and then similar documents are retrieved to generate answers. For example, for the question "What are the advantages of quantum computing?", the system retrieves relevant paragraphs from Wikipedia and generates answers.
  • Problems: Low retrieval precision (e.g. recalling irrelevant content), possible hallucinations in the generation phase, and insufficient context integration.

2. Advanced RAG  

  • Pre-retrieval optimization: Use sliding window indexing, fine-grained segmentation (such as splitting paragraphs by semantics), and metadata tags (such as document source and timestamp) to improve retrieval quality.  
  • Post-retrieval optimization: Filter the most relevant fragments through re-ranking and context compression.  
    • Case: In medical question-and-answer sessions, the latest clinical guidelines are screened through metadata to ensure the timeliness of generated answers.

3. Modular RAG

  • Features: Modular design supports dynamic combination.  
  • Search module: supports multi-source retrieval (database, knowledge graph, API).  
  • Memory module: caches historical search results to speed up response.  
  • Routing module: Select different retrieval strategies according to the task type.  
    • Case: In the customer service system, the routing module automatically determines the user's intention (technical issue → search for product documentation, complaint → search for processing flow) to improve service efficiency.


2. RAG’s core technology components

1. Search optimization

  • Data source selection: Mixed use of structured data (database tables) and unstructured data (text, PDF).  
  • Indexing strategy: Hierarchical indexing combines coarse-grained and fine-grained segmentation.
  • Query optimization: Use LLMs to rewrite user queries (such as expanding synonyms) to improve retrieval relevance.

2. Generate enhancements

  • Contextual integration: combining content with knowledge within LLMs through the “Chain-of-Thought”.
  • Generative control: restrict LLMs to rely only on retrieval content (reduce hallucinations) or allow mixed reasoning (improve creativity). 
  • Example: In financial reporting, the model only retrieves the latest financial report data to generate analysis, avoiding reliance on outdated knowledge.

3. Enhancement strategy 

  • Iterative retrieval: secondary retrieval of supplementary information based on the initial generated results.  
  • Adaptive retrieval: dynamically adjust the search scope (such as expanding the time window or switching data sources).  
  • Case: After generating the first draft, the academic paper writing assistant automatically searches for relevant research to fill in logical gaps.


3. Evaluation Methods and Challenges

1. Evaluation Metrics

  • Quality indicators: Context Relevance, Answer Faithfulness, Answer Relevance.  
  • Capability indicators: Noise Robustness, Counterfactual Robustness.  
  • Assessment tools: RAGAS (RAG-specific assessment framework), ARES (automated scoring system).

2. Main Challenges

  • Long context processing: The retrieval content is too long, resulting in a decrease in generation speed (need to balance retrieval accuracy and efficiency).  
  • Multimodal expansion: The retrieval and generation of non-text data such as images and audio are not yet mature.  
  • Ecological tool chain: Existing tools (such as LangChain) do not provide sufficient support for complex workflows.


4. Future Research Directions

  • Vertical optimization: Improve the accuracy and domain adaptability of RAG in professional fields (such as law and medicine).  
  • Hybrid methods: combining RAG with fine-tuning, such as training a dedicated retriever.  
  • Multimodal RAG: supports cross-modal retrieval (e.g., retrieving images based on text descriptions).


5. English-Chinese comparison table of key terms