RAG Cheat Knowledge Base

Written by

Jasper Cole

Updated on:June-28th-2025

What is RAG?

RAG is a plug-in knowledge base for large models. Its full English name is Retrieval-Augmented Generation,
which is a knowledge plug-in for large models.

Clarify the knowledge boundary of the big model

First, clarify the knowledge boundary of the big model and know what the big model cannot do.
First: The big model is trained by past knowledge, so there are shortcomings in the real-time nature of knowledge
. Second: The core implementation of the big model is based on mathematical probability, so there are illogical and unrealistic phenomena in content generation, which is also called illusion.
Third: The big model is generally trained with public data sets and lacks professional knowledge, so it needs to be supplemented with industry or company-internal professional knowledge. Another reason is that in order to ensure the data security of professional knowledge, such knowledge is generally not placed on a third-party platform for training and reasoning.
Therefore, RAG is mainly to make up for the shortcomings and optimize the current big model. Solve the real-time supplement of knowledge and the logical supplement of context, as well as the supplement of professional knowledge, and maintain data security.
Basically, RAG solves these four problems.

Implementation of RAG

After clarifying the purpose of RAG, let's discuss how it is implemented. As mentioned above, RAG supplements the knowledge of the big model, that is, RAG should have a lot of data, so if we want to use these data, it involves the problem of data retrieval and storage.
Retrieval and generation are the core of RAG.
In the area of retrieval, in order to improve the efficiency and accuracy of retrieval, some technologies must be used, including query optimization and index optimization.
In the area of query, optimization methods such as Enrich, multi-way recall, and question decomposition are generally used (here we mainly explain these three).
Enrich, enriching the query, that is, requiring users to add enough information, that is, requiring users to provide more comprehensive information.
Multi-way recall, searching for questions from different dimensions.
Question decomposition, decomposing questions and answering them one by one.
It should be noted here that multi-way recall and question decomposition are essentially different.

Index optimization includes:
summary index, here we will combine the big model to generate a summary of the knowledge document, and then use the summary as the index.
Parent-child index, here is actually cutting the document, for example, after cutting the document into different large blocks, each large block is cut into several small blocks, similar to a tree structure.
Hypothetical question index, here we will generate several hypothetical questions for the document, and then use the questions as indexes.

And metadata index, such as the title, author, and even abstract of the article, that is, the relevant information of the article.

The optimization mentioned above is pre-retrieval optimization. In addition to pre-retrieval optimization, there is also post-retrieval optimization.
This includes hybrid retrieval, reordering, and vector filtering. (To avoid complexity, multimodal RAG is not explained here)

After talking about search and indexing, let's take a look at storage.
In terms of storage, it is generally divided into two parts, one is the storage of knowledge documents, and the other is the storage of index vectors.
When storing index vectors, the document information corresponding to the storage index is generally attached, that is, the document will be found according to the index.

In addition to the above implementation, let's click on data preprocessing, that is, data parsing, segmentation, and warehousing.
The types of data include pdf, world, excel, etc. We need to do some necessary parsing and segmentation of these documents, and finally warehousing and building indexes.