Exploring RAG multi-knowledge source solutions based on the explosive deepseek

Written by

Iris Vance

Updated on:July-17th-2025

As we all know, the core of RAG is retrieval and enhancement. The specific process is that when we ask a simple RAG system a question, the RAG system can search the database according to the characteristics of the question, obtain relevant fragments, and then transmit them to the big model to generate the answer.

Such a RAG system can retrieve relevant fragments well when dealing with specific problems, but it is not so applicable when our problems become broad.

For example, we have an article with several chapters. When we want to ask about the knowledge points of a specific chapter, the traditional RAG system can ideally lock on to the relevant text fragments of the knowledge points. However, when the questions we ask are not so specific, such as "What is the content of Chapter 2 of this article?", "What is the content of this article?", the original search logic is not suitable, because the granularity at this time is no longer a fragment, but a more macroscopic one. At this time, we have to introduce a RAG process with multiple knowledge sources !

Pain points of simple RAG process

From the examples in the introduction, we can see that a simple RAG process often consists of two parts: retrieval and generation. It needs to use external knowledge sources to provide LLM with contextual information in vertical fields to reduce hallucination generation. However, a simple RAG process with a single knowledge source has several limitations:

Single knowledge source: Simple RAG systems often only consider one external knowledge source and cannot meet complex and diverse business needs;
Single search method: Different search methods are required to meet different business needs. For example, some are suitable for vector search, while others require keyword matching.
Rough handling of problems: Since there is only a single source of knowledge and retrieval method, there is naturally a lack of ability to conduct in-depth analysis of problems.

In order to achieve the effect of multiple knowledge sources, the conventional processing idea is to add an intention recognition module before the previous retrieval module for task distribution.llama_indexAnd the recent hotdeepseekExploration was carried out.

Constructing RAG process with multiple knowledge sources

process

User input issues;
The routing query engine uses deepseek to analyze the question type;
If it is a general question, then querysummary_query_engineIf it is a detailed question, please queryvector_query_engine;
After the query obtains the question fragment, it is given todeepseekSummarize and get the answer.

Specific implementation

First, load the document . Here we take o1-preview-system-card-20240917.pdf released by openai as an example:

from  llama_index.core  import  SimpleDirectoryReader

document = SimpleDirectoryReader(input_files= "./data/o1-preview-system-card-20240917.pdf" ).load_data()

Slice the document and generatenodes

from  llama_index.core.node_parser  import  SentenceSpitter

# Use SentenceSplitter to split documents into nodes
splitter = SentenceSplitter(chunk_size= 1024 )
nodes = splitter.get_nodes_from_documents(documents)

set updeepseekandembedding

becausedeepseeksupportopenaiThe calling method of the large model can bellama_indexUse inOpenAILikeClass to connect, see the code for specific methods:

from  llama_index.llms.openai_like  import  OpenAILike
from  llama_index.embeddings.huggingface  import  HuggingFaceEmbedding

Settings.embed_model = HuggingFaceEmbedding(
    model_name= "BAAI/bge-large-zh"
)   # Use Hugging Face's embedding model

Settings.llm = OpenAILike(
    model = "deepseek-chat" ,  
    api_base= "https://api.deepseek.com" ,  
    api_key = "****" , 
    is_chat_model = True
)

Creating indexes and query engines

SummaryIndexThe principle is to arrange all nodes in the form of a list, and when querying, all nodes will be sent to llm;
andVectorStoreIndex, it searches according to vector similarity and returns the top_k fragments.

# Create SummaryIndex and VectorStoreIndex
summary_index = SummaryIndex(nodes, show_progress= True )
vector_index = VectorStoreIndex(nodes)

# Create a query engine
summary_query_engine = summary_index.as_query_engine(response_mode= "simple_summarize" )
vector_query_engine = vector_index.as_query_engine(similarity_top_k= 2 )

# Create QueryEngineTool
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description= "can be used to summarize the article" ,
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description= "Can be used to retrieve specific information in the article" ,
)

The most critical step is to create a routing query engine

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[summary_tool, vector_tool],
    verbose= True ,
)

Verification

response = query_engine.query( "What is the main content of the article?" )

response = query_engine.query("What is the main content of the article")

Summarize

This paper mainly uses the routing query engine and retrieval library tollama_indexanddeepseekA solution for implementing multiple knowledge sources has been implemented.

The results also show that VectorStoreIndexCan handle details better.SummaryIndexIt can handle generalization problems better, and we can enrich it later.RouterQueryEngineconfiguration to enrich the functions of the RAG system.