LangChain Advanced Guide: RAG Practice Summary

Explore RAG's breakthrough progress in improving the accuracy and reliability of language models.
Core content:
1. The principle and advantages of RAG in solving the "hallucination" problem of LLMs
2. Analysis of RAG's model architecture and workflow
3. The potential and challenges of RAG in practical applications
Why use RAG?
There is a complementary relationship between RAG and LLM. RAG can be regarded as an LLM with expanded functions. It enables LLM to effectively utilize external knowledge by introducing additional retrieval steps. The combination of these two technologies can achieve better results in many complex natural language processing tasks, providing new possibilities for developing more powerful NLP systems.
Retrieval: When a user query is received, the most relevant documents are found using the retrieval index. Specifically, the query information entered by the user is converted into a vector, and then other matching context information is retrieved from the vector database. Through this similarity search, the most matching related data in the vector database can be found.
Enhancement: Put the information retrieved from the vector database and the user query information into our customized prompt template.
Generate: Finally, the above searched and enhanced prompt content is input into LLM, and LLM generates the final result based on the above information.
Load: Load the specified data. Different files can be loaded through different document loaders .
Split: Split the content into smaller chunks using a text splitter. This is useful both for indexing data and for passing it to models, as large chunks are harder to search and don't fit into the limited context window of a model.
Embedding: Embedding technology can be used to map high-dimensional data (such as text, pictures, and audio) to low-dimensional space, that is, pictures, audio, and text are ultimately converted into vectors for representation. A vector is a set of values that can represent the position of a point in a multidimensional space.
Storage: A vector database is needed to store and index the segmented vectors to facilitate quick retrieval of data in the future.
Retrieval enhancement generation method based on LangChain
In this section, we will show how to use Python with OpenAI's large language model, Weaviate's vector database, and OpenAI's embedding model to implement a retrieval-augmented generation (RAG) pipeline. In this process, we will use LangChain for overall orchestration.
Part 1 Writing external knowledge into the vector database
iii. Embed the chunked content and store the chunks
First, load the specified data. The data type supports different files such as text, PDF, Word, Excel, CSV, HTML, etc. The following example uses TextLoader to load text. Different loaders can be used for different file types, such as CSVLoader to load CSV files, WebBaseLoader to load web page data through URL, and PDF also has many loading methods, such as PDFMinerLoader is one of them.
import requestsfrom langchain.document_loaders import TextLoaderurl="https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt"res = requests.get(url)withopen("state_of_the_union.txt","w")as f:f.write(res.text)loader = TextLoader('./state_of_the_union.txt')documents = loader.load()
Secondly, the document needs to be chunked. Chunking is the process of breaking up large chunks of text into smaller pieces. Chunking can help us optimize the accuracy of the content recalled from the vector database. LangChain also provides many text segmentation tools. For this example, CharacterTextSplitter can be used for segmentation. Set the chunk size chunk_size to 500 and the number of overlapping tokens chunk_overlap to 50 to ensure the coherence between text chunks.
from langchain.text_splitter import CharacterTextSplittertext_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)chunks = text_splitter.split_documents(documents)
Finally, we convert it into vectors through embedding technology and store these text blocks. In order to make LLM understand the content in the file, we need to embed the data after segmentation and store them. There are many ways to embed. In this example, we use the embedding model of OpenAI; finally, we use the Weaviate vector database to save the vectors. In LangChain, by executing the .from_documents() operation, we can automatically fill these block vectors into the vector database.
from langchain.embeddings import OpenAIEmbeddingsfrom langchain.vectorstores import Weaviateimport weaviatefrom weaviate.embedded import EmbeddedOptionsclient = weaviate.Client(embedded_options = EmbeddedOptions())vectorstore = Weaviate.from_documents(client = client,documents = chunks,embedding = OpenAIEmbeddings(),by_text =False)
Part 2 Using the data in the vector library to retrieve questions raised by users
Retrieval: Now that the file has been converted to vector data and written into the vector database, it can be set up as a retrieval component. This component can retrieve additional contextual information based on the semantic similarity between the user query and the embedded text block.
retriever = vectorstore.as_retriever()
Enhancement: Prepare a prompt template to enhance the original prompt by adding preset prompt information and retrieved context information.
from langchain.prompts import ChatPromptTemplatetemplate ="""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. ChatPromptTemplate.from_template(template)print(prompt)
Generate: Through the RAG chain, you can combine the retriever, prompt template and LLM. The following RAG chain will first perform a vector search on the question raised by the user, then combine the retrieved data with the prompt template, and finally LLM generates the answer based on the above information.
from langchain.chat_models import ChatOpenAIfrom langchain.schema.runnable import RunnablePassthroughfrom langchain.schema.output_parser import StrOutputParserllm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)rag_chain =({"context": retriever,"question": RunnablePassthrough()}| prompt| llm| StrOutputParser())query="What did the president say about Justice Breyer"rag_chain.invoke(query)
"The President also mentioned that he has nominated Judge Ketanji Brown Jackson to succeed Justice Breyer and continue her distinguished legacy."
Summarize
Ensure that LLM answers with the latest, most accurate content. And users can access the source of the model content, ensuring that its claims can be checked for accuracy and ultimately trusted.
By basing the LLM on an external, verifiable set of factual data, there is less opportunity for the model to extract information into its parameters. This reduces the chances of the LLM leaking sensitive data or “hallucinating” incorrect or misleading information.
RAG also reduces the need for users to continuously train models based on new data and update training parameters as the data changes. In this way, enterprises can reduce related financial costs.