Exploring RAG multi-knowledge source solutions based on the explosive deepseek

Explore the RAG multi-knowledge source solution to improve problem handling capabilities and accuracy.
Core content:
1. The core process and limitations of the RAG system
2. The construction ideas of the multi-knowledge source RAG solution
3. Practical exploration based on llama_index and deepseek
As we all know, the core of RAG is retrieval and enhancement. The specific process is that when we ask a simple RAG system a question, the RAG system can search the database according to the characteristics of the question, obtain relevant fragments, and then transmit them to the big model to generate the answer.
Such a RAG system can retrieve relevant fragments well when dealing with specific problems, but it is not so applicable when our problems become broad.
For example, we have an article with several chapters. When we want to ask about the knowledge points of a specific chapter, the traditional RAG system can ideally lock on to the relevant text fragments of the knowledge points. However, when the questions we ask are not so specific, such as "What is the content of Chapter 2 of this article?", "What is the content of this article?", the original search logic is not suitable, because the granularity at this time is no longer a fragment, but a more macroscopic one. At this time, we have to introduce a RAG process with multiple knowledge sources !
Pain points of simple RAG process
From the examples in the introduction, we can see that a simple RAG process often consists of two parts: retrieval and generation. It needs to use external knowledge sources to provide LLM with contextual information in vertical fields to reduce hallucination generation. However, a simple RAG process with a single knowledge source has several limitations:
Single knowledge source: Simple RAG systems often only consider one external knowledge source and cannot meet complex and diverse business needs; Single search method: Different search methods are required to meet different business needs. For example, some are suitable for vector search, while others require keyword matching. Rough handling of problems: Since there is only a single source of knowledge and retrieval method, there is naturally a lack of ability to conduct in-depth analysis of problems.
In order to achieve the effect of multiple knowledge sources, the conventional processing idea is to add an intention recognition module before the previous retrieval module for task distribution.llama_index
And the recent hotdeepseek
Exploration was carried out.
Constructing RAG process with multiple knowledge sources
process
User input issues; The routing query engine uses deepseek to analyze the question type; If it is a general question, then query summary_query_engine
If it is a detailed question, please queryvector_query_engine
;After the query obtains the question fragment, it is given to deepseek
Summarize and get the answer.
Specific implementation
First, load the document . Here we take o1-preview-system-card-20240917.pdf released by openai as an example:
from llama_index.core import SimpleDirectoryReader
document = SimpleDirectoryReader(input_files= "./data/o1-preview-system-card-20240917.pdf" ).load_data()
Slice the document and generatenodes
from llama_index.core.node_parser import SentenceSpitter
# Use SentenceSplitter to split documents into nodes
splitter = SentenceSplitter(chunk_size= 1024 )
nodes = splitter.get_nodes_from_documents(documents)
set updeepseek
andembedding
becausedeepseek
supportopenai
The calling method of the large model can bellama_index
Use inOpenAILike
Class to connect, see the code for specific methods:
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.embed_model = HuggingFaceEmbedding(
model_name= "BAAI/bge-large-zh"
) # Use Hugging Face's embedding model
Settings.llm = OpenAILike(
model = "deepseek-chat" ,
api_base= "https://api.deepseek.com" ,
api_key = "****" ,
is_chat_model = True
)
Creating indexes and query engines
SummaryIndex
The principle is to arrange all nodes in the form of a list, and when querying, all nodes will be sent to llm;and VectorStoreIndex
, it searches according to vector similarity and returns the top_k fragments.
# Create SummaryIndex and VectorStoreIndex
summary_index = SummaryIndex(nodes, show_progress= True )
vector_index = VectorStoreIndex(nodes)
# Create a query engine
summary_query_engine = summary_index.as_query_engine(response_mode= "simple_summarize" )
vector_query_engine = vector_index.as_query_engine(similarity_top_k= 2 )
# Create QueryEngineTool
summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine,
description= "can be used to summarize the article" ,
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description= "Can be used to retrieve specific information in the article" ,
)
The most critical step is to create a routing query engine
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=[summary_tool, vector_tool],
verbose= True ,
)
Verification
response = query_engine.query( "What is the main content of the article?" )
response = query_engine.query("What is the main content of the article")
Summarize
This paper mainly uses the routing query engine and retrieval library tollama_index
anddeepseek
A solution for implementing multiple knowledge sources has been implemented.
The results also show that VectorStoreIndex
Can handle details better.SummaryIndex
It can handle generalization problems better, and we can enrich it later.RouterQueryEngine
configuration to enrich the functions of the RAG system.