Technical principles for implementing RAG based on LangChain

Written by
Clara Bennett
Updated on:June-21st-2025
Recommendation

In-depth interpretation of the RAG technical principles and mastering the optimization strategies of large AI models.

Core content:
1. The concept of RAG technology and its optimization effect on large AI models
2. Introduction and comparison of common RAG technical frameworks
3. Detailed explanation of RAG implementation steps and model effect evaluation under the LangChain framework

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Earlier, we introduced professional terms related to big models, such as AGI, RAG, and LLM. We also mentioned some shortcomings of big AI models at the current stage, such as the requirements for the quantity and quality of training data, the cost of computing power and electricity, and the biggest problem of big models: information illusion .

At present, the industry generally uses the RAG method to deal with large model information hallucinations, namely Retrieval-Augmented Generation . In simple terms, it uses the large model + knowledge base method to retrieve relevant fragments from a wide knowledge base, and then generate the final result based on these contents .

The advantage of this approach is that, on the one hand, it can alleviate the information hallucination problem of large models and improve the performance of results in specific fields ; on the other hand, it can improve the efficiency of information retrieval and result generation as well as user experience .

The various common ChatBots (i.e. chat robots) we use are based on this technical principle. The common technical frameworks related to it are as follows:

  • LangChain : An open source framework that provides rich components and tools for building RAG systems.
  • LLama-Index : A RAG framework designed specifically for the LLama model, suitable for applications in specific scenarios.
  • RAGFlow : A newer RAG framework that focuses on simplicity and efficiency, providing preset components and workflows.
  • Haystack : A commonly used open source framework that supports vector storage and orchestration layers and is an important component of the RAG system.
  • GraphRAG : Focuses on large model-driven RAG technology, improving the efficiency of the RAG system by optimizing vector library construction and inference performance.


Taking LangChain as an example, the implementation principle of using RAG to mitigate information illusion has the following steps:

Simplifying the above steps, we can get the following seven key steps:

  • Upload documents : Users upload relevant knowledge documents. LangChain supports multiple formats such as txt, pdf, docx, etc., and automatically converts them into markdown format for storage.
  • Text segmentation : To facilitate analysis and processing, long texts need to be segmented into multiple small chunks, similar to the information transmission mechanism of the TCP protocol (segmentation followed by transmission + packet loss detection).
  • Text vectorization : The cut text blocks are converted into vectors that can be processed by the algorithm through EMB (data splitting and mapping) technology, and stored in the vector database.
  • Question vectorization : vectorize the user's question content (cutting + splitting + mapping).
  • Semantic retrieval and matching : Match the vectorized user question content with the text blocks in the vector library, and match the top X (the size depends on the rule definition) most similar to the user question vector according to the rules.
  • Submit prompt to LLM : Add the matched text and user question to the defined prompt word template and submit it to LLM for processing.
  • Generate final output answer : LLM generates the final output answer and returns it to the user.

To summarize with a picture, the principle of implementing RAG capabilities based on LangChain is as follows:

At this point, a ChatBot that uses RAG capabilities to alleviate large model information hallucinations has been completed. The next step is to evaluate the model effect.

Please note that this does not mean that a ChatBot with high answer accuracy can be implemented based on the above solution. Instead, it requires continuous and extensive training to obtain a large model that better matches our expectations. This is also a microcosm of the large model training process .


To evaluate the model effect, we first need to determine the evaluation criteria . The definition of the evaluation criteria needs to be based on the expected goals of the training model and the actual business scenarios, and evaluate the answer content output by the large model . According to the general testing principles in the technical field, it is necessary to build an evaluation set (i.e., test cases in the IT technology field) and conduct evaluation on it .

The evaluation set needs to meet the following requirements:

  • Can understand user questions.
  • The correct knowledge base content can be matched.
  • Answers based on user questions and knowledge base content must be comprehensive and accurate.
  • Whether the final answer output contains information other than the knowledge base content (in simple terms, it means being connected to the Internet/not being connected to the Internet).
  • Whether the final answer output is reliable (that is, it is necessary to compare the output results of multiple rounds of evaluation to determine whether the difference is too large).
  • It is necessary to support context-based follow-up questions, and the final answer output after the follow-up questions must also meet the above requirements.

There are many factors that affect the quality of the output results of large models. This continuous training and evaluation process is the process of training and tuning large models. Only large models that have passed multiple rounds of evaluation and met the standards can be truly applied in work scenarios.