RAG is not working? Try MCP, the "Knowledge Base Optimization Master"

How to efficiently manage and utilize internal knowledge assets during enterprise digital transformation? MCP helps you build a high-performance enterprise RAG system and realize intelligent knowledge base applications.
Core content:
1. Challenges and limitations of RAG technology in enterprise knowledge management
2. Advantages of MCP solutions and enterprise knowledge management requirements
3. Design and implementation goals of enterprise RAG system based on MCP
In the wave of enterprise digital transformation, how to effectively manage and utilize internal knowledge assets has become a key challenge. With the maturity of large language model (LLM) technology, retrieval augmentation generation (RAG) applications are gradually becoming an important bridge connecting enterprise knowledge and AI capabilities. However, traditional RAG implementations often face pain points such as poor retrieval quality and difficulty in real-time updates.
This article will use actual cases to detail how to build a high-performance enterprise RAG system based on the Model Context Protocol (MCP) to help enterprises quickly build intelligent knowledge base applications.
Advantages of MCP compared with traditional RAG
Limitations of traditional RAG solutions
Traditional RAG implementations usually adopt a simple "Embedding+retrieval+LLM generation" architecture , which has the following limitations:
1. Tightly coupled architecture : The search logic is tightly coupled with the LLM call, making it difficult to optimize independently 2. Single search strategy : Usually only vector search is used, lacking the combination of multiple search methods 3. Lack of standardized interfaces : The interfaces between implementations vary greatly, making it difficult to achieve functional reuse 4. High maintenance cost : System upgrades require modifying a large amount of underlying code
Advantages of MCP Solutions
The MCP-based RAG system decouples knowledge retrieval services into independent modules through standardized protocols , bringing the following advantages:
1. Standardized tool calling : MCP provides unified interface specifications to reduce integration costs 2. Decoupled design : Separate model calls from business logic to facilitate independent upgrades and maintenance 3. Flexible expansion : easily add new data sources and functional modules, such as hybrid retrieval, multimodal content, etc. 4. Engineering practice friendly : in line with software engineering best practices, easy for team collaborative development Image from dailydoseofds
Project background and requirements
The knowledge management challenges faced by modern enterprises are mainly manifested in the following aspects:
• Knowledge dispersion : Enterprise documents are distributed in multiple systems and lack a unified search portal • Low retrieval efficiency : Traditional keyword retrieval cannot understand semantics and is difficult to accurately find the required information • Slow knowledge update : Knowledge base updates rely on manual compilation and cannot reflect the latest situation in a timely manner • High threshold for use : Professional terminology and complex query syntax increase the difficulty for ordinary employees to use
To address these issues, we need to design a system that meets the following core requirements:
1. Intelligent retrieval : Support natural language questions and understand the intention and context of the question 2. Knowledge automation processing : intelligent document splitting and automatic FAQ extraction 3. Flexible expansion : support for multiple data sources and model integration 4. Easy to deploy and maintain : The architecture is simple, making it easy for the technical team to master and iterate
Project Objectives
This project aims to build an enterprise RAG system based on MCP and achieve the following specific goals:
1. Technical goals
• Build knowledge base services and clients that support the MCP protocol • Realize intelligent document segmentation and automatic FAQ extraction • Supports decomposition of complex questions and mixed search strategies
2. Application Target • Provide unified knowledge base management and retrieval portal • Significantly improve the accuracy of internal knowledge retrieval (target over 90%) • Reduce knowledge base maintenance workload by 70% • Support intelligent processing and retrieval of various enterprise documents 1. Knowledge Base Service (MCP Server) : A backend service based on the Milvus vector database, responsible for document storage and retrieval 2. Client tool (MCP Client) : The client that communicates with the MCP Server to implement knowledge base construction and retrieval functions 3. Large model integration : LLM is used to implement core functions such as document segmentation, FAQ extraction, question decomposition and answer generation. 1. Knowledge base construction 1. Text segmentation: Segment the text. The content after segmentation needs to ensure text integrity and semantic integrity. 2. Extract FAQ: Extract FAQ based on text content as a supplement to knowledge base retrieval to improve retrieval results. 3. Import the knowledge base: import the text and FAQ into the knowledge base, and then import the vector after embedding. 2. Knowledge Retrieval (RAG) 1. Problem decomposition: Decompose and rewrite the input problem into more atomic sub-problems. 2. Retrieval: Retrieve relevant text and FAQ for each sub-question. Use vector retrieval for text and full-text and vector hybrid retrieval for FAQ. 3. Knowledge base content screening: Screen the retrieved content and retain the content most relevant to the question for reference answers. 1. Knowledge base: It contains Knowledge Store and FAQ Store, which store text content and FAQ content respectively, and supports hybrid retrieval of vector and full text. 2. MCP Server: Provides read and write operations for the Knowledge Store and FAQ Store, and provides a total of 4 Tools. 3. **Function implementation part: **The functions of importing, searching and question-answering the knowledge base are completely realized through Prompt + LLM. 1. milvus-mcp-client
: The client implemented in Python realizes the basic capabilities of interacting with the big model, obtaining Tools through MCP Client, and calling Tools based on the feedback of the big model. The three main functions of knowledge base construction, retrieval and question and answer are realized through Prompt.2. ** milvus-mcp-server
:** The server side implemented in Python is a service implemented based on the MCP framework. It provides an interface to connect to the Milvus vector database and supports the storage and retrieval functions of the knowledge base.• Docker and Docker Compose • At least 4 CPU, 4 GB RAM and 20 GB disk space • Clone code git clone -b rag_0.1.1 https://github.com/FlyAIBox/mcp-in-action.git
Project system design and implementation
The system design of this project is based on alibabacloud-tablestore-mcp-server [1] .
alibabacloud-tablestore-mcp-server
The project uses Tablestore storage and MCP Server implemented in Java, which is not convenient for later expansion and iteration.This project was transformed into Milvus storage and Python to implement MCP Server and MCP Client, and all the codes were rewritten (cursor helped a lot).
The following designs and processes are
alibabacloud-tablestore-mcp-server
Thanks to @xjtushilei for the open sourcealibabacloud-tablestore-mcp-server
.
The MCP-based RAG system we built mainly consists of three core parts:
It is mainly divided into two parts: knowledge base construction and retrieval.
Compared with the traditional Naive RAG, some common optimizations have been made in knowledge base construction and retrieval, including Chunk segmentation optimization, FAQ extraction, Query Rewrite, hybrid retrieval, etc.
process
The overall architecture of this Agent is divided into three parts:
Project Structure
The project structure is divided into two parts:
Project Practice: Building an MCP-RAG System from Scratch
Next, we will comprehensively introduce how to build a RAG system based on MCP, from environment construction, service deployment to functional testing.
Environment Preparation
First, make sure you meet the following system requirements:
Deploy MCP Server
MCP Server is based on the Milvus vector database and provides storage and retrieval functions for the knowledge base.
For scenarios that require development or debugging, you can choose local deployment:
# Enter the project directory
cd mcp-rag
# Start Milvus and dependent services first
docker compose up -d etcd minio standalone
# Create a Python virtual environment
python -m venv env-mcp-rag
source env-mcp-rag/bin/activate
# Install dependencies
pip install -r requirements.txt
# Start the service
python -m app.main
Core API of MCP Server
MCP Server provides four core tools to support the reading and writing of the knowledge base:
1. storeKnowledge : Store documents in the knowledge base 2. searchKnowledge : Search for similar documents in the knowledge base 3. storeFAQ : store FAQ to FAQ library 4. searchFAQ : Search for similar question and answer pairs in the FAQ database
Let's see how these APIs are actually implemented:
async def store_knowledge ( self, content: str , metadata: Dict [ str , Any ] = None ) -> Dict [ str , Any ]:
"""Store knowledge content in Milvus"""
# Ensure the service is ready
await self .ready_for_connections()
try :
knowledge_content = KnowledgeContent(
content=content,
metadata=metadata or {}
)
self .milvus_service.store_knowledge(knowledge_content)
return { "status" : "success" , "message" : "Knowledge stored successfully" }
except Exception as e:
logger.error( f"Error storing knowledge: {e} " )
return { "status" : "error" , "message" : str (e)}
This code showsstoreKnowledge
Tool implementation: Receive text content and metadata, create knowledge content objects, and then store them in the vector database through the Milvus service.
Implementing RAG client based on MCP Client
Next, we need to implement a RAG client to communicate with the Server through the MCP protocol to implement the knowledge base construction and query functions.
1. Knowledge base construction
• Text segmentation: Intelligently segment long texts to ensure semantic integrity • FAQ Extraction: Automatically generate FAQ pairs from documents • Vectorized storage: convert text snippets and FAQs into vectors and store them in Milvus
Text segmentation code example:
def _chunk_text ( self, text: str ) -> List [ str ]:
"""Split the text into chunks to ensure semantic integrity"""
chunks = []
# Handle the simple case where the text is smaller than chunk_size
if len (text) <= self .chunk_size:
chunks.append(text)
return chunks
# Split text using overlapping strategy
start = 0
while start < len (text):
# Get the end position of the chunk
end = start + self .chunk_size
# Adjust the end position to avoid cutting off in the middle of a sentence
if end < len (text):
# Find sentence boundaries (periods, question marks, exclamation marks)
sentence_end = max (
text.rfind( '. ' , start, end),
text.rfind( '? ' , start, end),
text.rfind( '! ' , start, end)
)
# If the end of a sentence is found, use it as the end of the chunk
if sentence_end > start:
end = sentence_end + 1 # includes period
# Add chunk
chunks.append(text[start: min (end, len (text))])
# Move the start position to the next chunk, taking into account overlap
start = end - self .chunk_overlap
# Ensure progress
if start >= len (text) or start <= 0 :
break
return chunks
FAQ extraction, implemented through LLM:
async def _extract_faqs ( self, text: str ) -> List [ Dict [ str , str ]]:
"""Extract FAQ from text"""
# Split long text into chunks
if len (text) > 8000 :
chunks = self ._chunk_text(text)
faqs = []
for chunk in chunks:
chunk_faqs = await self ._extract_faqs(chunk)
faqs.extend(chunk_faqs)
return faqs
# FAQ extracted hint template
system_prompt =
"""You are a professional knowledge extraction expert. Your task is to extract possible Frequently Asked Questions (FAQ) from text.
These questions should be natural questions that a user might ask about the content of the text, and the answers should be found in the text.
The extracted FAQ should cover the most important concepts and information in the text.
Please follow these rules:
1. Each FAQ consists of a question and an answer
2. Questions should be short and to the point.
3. Answers should be comprehensive but concise, providing relevant information from the text
4. The number of FAQs extracted should be based on the length and content richness of the text, usually not more than 10
5. Ensure that the extracted FAQs are not repeated
6. Sort by importance, the most important questions should be put first
The output format must be a JSON array, where each FAQ is an object containing "question" and "answer" fields, for example:
[
{
"question": "Question 1?",
"answer": "Answer 1"
},
{
"question": "Question 2?",
"answer": "Answer 2"
}
]
Only output JSON format, without any other text. """
user_prompt = f""" Extract FAQ from the following text:
```
{text}
```
Please extract the most relevant and valuable FAQs and return them in JSON format. """
# Extract FAQ using LLM
response = self .llm_client.sync_generate(
prompt=text,
system_prompt=system_prompt,
temperature = 0.3
)
# Parse LLM response to get FAQ
# ...
2. Knowledge retrieval optimization
Different from traditional RAG, we introduced three optimization mechanisms in the retrieval process: question decomposition, hybrid retrieval and result screening.
• Problem decomposition: Break down complex problems into multiple sub-problems • Hybrid search: Search the text library and FAQ library at the same time to improve the recall rate • Result filtering: Sort and filter search results, giving priority to high-quality content
Example of problem decomposition:
async def _decompose_question ( self, question: str ) -> List [ str ]:
"""Break down complex problems into simpler sub-problems"""
system_prompt =
"""You are an expert in problem analysis. Your task is to decompose complex problems into simpler sub-problems in order to better retrieve relevant information.
Please follow these rules:
1. Analyze the user’s problem and identify the different aspects or concepts it contains
2. Break complex problems into simpler, more specific sub-problems
3. Make sure the sub-questions cover all key aspects of the original question
4. Provide 2-4 sub-questions, the specific number depends on the complexity of the original question
5. Sub-questions should be clear and targeted
6. Avoid duplication between sub-questions
The output format must be a JSON array containing strings of all subquestions, for example:
["Sub-problem 1", "Sub-problem 2", "Sub-problem 3"]
If the original question is simple enough and does not need to be decomposed, a JSON array containing only the original question is returned:
["Original Question"]
Only output JSON format, without any other text. """
user_prompt = f"""Please break down the following question into simpler sub-questions for easier retrieval: {question} """
# Generate subproblems using LLM
response = self .llm_client.sync_generate(
prompt=user_prompt,
system_prompt=system_prompt,
temperature = 0.3
)
# Parse the response to get the sub-question list
# ...
Key code for result screening and answer generation:
async def _filter_context ( self, question: str , context_items: List [ Dict [ str , Any ]] ) -> List [ Dict [ str , Any ]]:
"""Filter context by question relevance"""
# Simple filtering: deduplication and truncation
seen_contents = set ()
filtered_items = []
# Prioritize FAQ types
faq_items = [item for item in context_items if item[ "type" ] == "faq" ]
knowledge_items = [item for item in context_items if item[ "type" ] == "knowledge" ]
# Process FAQ items first
for item in faq_items:
# Deduplication
# ...
# Reprocess knowledge items
for item in knowledge_items:
# Deduplication
# ...
# Limit the total number of context items
max_context_items = 6
if len (filtered_items) > max_context_items:
filtered_items = filtered_items[:max_context_items]
return filtered_items
Actual effect display
After the deployment is complete, let's take a look at the actual operation of the system:
1. Knowledge base construction
python -m app.main build --file test.md --title "Basic Introduction to RAG" --author "Enterprise Knowledge Base" --tags "LLM,RAG,Knowledge Base"
Execution Result:
2025-05-11 14:50:16 | INFO | app.knowledge_builder:build_from_text:52 - Split text into 2 chunks
2025-05-11 14:50:59 | INFO | app.knowledge_builder:build_from_text:72 - Extracted 8 FAQs from text
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:48 - Stored 2/2 chunks to knowledge base
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:50 - Extracted and stored 8 FAQs
2. Knowledge Retrieval Question Answering
python -m app.main query --question "What are the advantages and disadvantages of RAG compared to the traditional knowledge base of the enterprise?"
Execution Result:
2025-05-11 15:01:46 | INFO | app.knowledge_retriever:query:39 - Decomposed question into 4 sub-questions
2025-05-11 15:01:47 | INFO | app.knowledge_retriever:query:67 - Filtered 28 context items to 6
================================================================================
Question: What are the advantages and disadvantages of RAG compared to the traditional knowledge base of enterprises?
--------------------------------------------------------------------------------
Answer: Retrieval-augmented generation (RAG) is a technique that optimizes the output of large language models (LLMs) by integrating authoritative knowledge bases outside of training data. The core of RAG is to allow LLMs to dynamically access internal knowledge bases of specific domains or organizations before generating responses, such as real-time data sources, documents, or professional databases, without retraining the model itself. This approach significantly improves the relevance, accuracy, and usefulness of generated content by introducing external information, while retaining the flexibility and generalization capabilities of LLMs.
================================================================================
Implementation Recommendations and Best Practices
Based on actual project experience, we have summarized the following best practices:
1. Documentation Strategy
• Properly set the text segment size (1000-1500 characters) and overlap rate (200-300 characters) • Adjust the segmentation strategy according to the document type, and treat technical documents and narrative documents differently • Keep the original format information of the document as metadata to improve retrieval accuracy
• Use hybrid search (semantic + keyword) to improve recall • Set a reasonable number of sub-questions (2-4) in the problem decomposition stage • Limit the total number of contexts (5-8) to avoid information overload
• Choose the appropriate vector model • Design incremental indexing strategies for real-time update requirements • Add monitoring and logging to detect and resolve issues promptly
Summary and Outlook
The RAG system based on MCP represents a new direction for knowledge base construction. Through the model context protocol, we not only solve many pain points in the traditional RAG system, but also provide enterprises with a low-cost and efficient knowledge management solution.
In the future, as large model technology advances and MCP standards improve, we can expect more innovative features to emerge:
• Support for multimodal content (images, audio, video, etc.) • More accurate real-time knowledge update mechanism • Adaptive retrieval optimization based on user feedback
For enterprises, now is the best time to start exploring and applying this technology. Through MCP-RAG, enterprises can fully tap the value of their own knowledge assets and provide smarter and more accurate information services to employees and customers.