What is RAG? Learn in one article

Written by
Clara Bennett
Updated on:June-20th-2025
Recommendation

Deeply understand RAG technology and improve the efficiency of large model applications.

Core content:
1. Introduction to RAG technology and application scenarios
2. RAG's core process and working principle
3. RAG mainstream framework and solutions

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

.

Large language models (LLMs) have made significant breakthroughs in natural language processing and natural language understanding. The combination of large models and application scenarios can help reduce costs while improving efficiency. In the implementation of specific scenarios, large models in general fields lack specific domain knowledge and need to be fine-tuned, which will consume a lot of computing resources.

Currently, retrieval-augmented generation (RAG), as a model of big language application, can combine the powerful understanding ability of big language models with domain knowledge to improve model accuracy and efficiency .

What is RAG

RAG is the abbreviation of “Retrieval-Augmented Generation”, which literally means “retrieval enhanced generation”.

First, let's imagine the big model as a knowledgeable teacher. When a student asks the teacher a very unpopular question, the teacher may not know the answer, so it is difficult to answer or cannot answer at all. RAG is equivalent to "I don't know the answer to this question, but I have a "cheat sheet". When I answer, I take a look at the cheat sheet, and then combine the existing knowledge in my mind to answer.

RAG is not training and does not change the "brain circuit" of the big model, but it can be used as a plug-in to improve the accuracy of the big model's answering questions.  It is suitable for enterprises that have accumulated a large number of knowledge base documents and associate them with the big model through RAG.

In this way, when answering questions, the large model will first search the knowledge base and give accurate answers.

RAG Core Process

The RAG main process is divided into two steps:

  • 1. Retrieve content related to the question from the knowledge base;
  • 2. Splice relevant knowledge into prompts and let LLM answer based on relevant knowledge and user questions.

The following is an example of a RAG prompt:

You are a senior merchant assistant, focusing on answering questions that users encounter when opening a store. Please answer the user's questions based on the relevant reference content between '---'.

Related references:
---
1. The first step to open a store on Amazon is to create a seller account. Sellers in Mainland China, Hong Kong, and Taiwan need to provide the following documents.

To help you complete the process smoothly, please gather all required materials before you begin. You can stop and continue the registration process at any time by saving your progress using the registration email address and password of your choice.

If you have trouble with the registration process, click Get Support on the registration page for assistance.

Registration Checklist:

1. Color scans of the following corporate documents:
For companies in mainland China: Company business license (business licenses of individual industrial and commercial households are not acceptable)
For companies in Hong Kong, China: Certificate of Incorporation and Business Registration Ordinance
--- 
User question: What information do Chinese merchants need to open a store on Amazon?
Note the following requirements:
1. When answering, refer to the original text as much as possible
2. If you are unable to provide an answer, please reply and contact the manual customer service
In the above example, due to the addition of relevant knowledge, the big model can give an accurate answer to the company qualification question. Compared with directly using the big model to answer, RAG can more effectively use vertical domain knowledge. In general, the main process of RAG is as follows:

RAG Mainstream Framework

Due to the advantages of RAG, such as interpretability, independence from model fine-tuning, and adaptability to diverse application requirements, there are many RAG-based solutions on the market, mainly including frameworks and applications:

  • •  Framework : mainly provides SDK for developers. Users need to connect to different model resources and build their own application process. It has a high degree of customization, but it is difficult to get started. Related frameworks include langchain, LlamaIndex, promptflow, etc.
  • •  Applications : Ready to use, most of them are 2C knowledge assistant applications. The general process is that users upload documents (knowledge base), and then they can conduct end-to-end Q&A based on the knowledge base (usually, the built-in Q&A processes of different applications have some differences in key links, such as recall strategy, whether to use Agent, etc.). Related applications include dify, Youdao QAnything, Byte Coze, etc.

Implement a small RAG demo

As a front-end developer, I prefer to use a technology stack with Nodejs as the core:

  • • Nodejs 18
  • • Langchain.js
  • • LLM model: DeepSeek-R1:7B

The technical solution is designed as follows, using component-based thinking to package each module to facilitate the expansion of subsequent modules:

Successfully executed locally:

Conclusion

At present, I have only tried the RAG process. Compared with fine-tuning, the entry threshold of RAG is much lower, and the effect is quite good. For most companies that want to apply AI, RAG is a good direction with low cost and high ROI.