Woter AI detection.Hurry - ends Jun 29th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

RAG is not working? Try MCP, the "Knowledge Base Optimization Master"

Written by

Jasper Cole

Updated on:June-21st-2025

In the wave of enterprise digital transformation, how to effectively manage and utilize internal knowledge assets has become a key challenge. With the maturity of large language model (LLM) technology, retrieval augmentation generation (RAG) applications are gradually becoming an important bridge connecting enterprise knowledge and AI capabilities. However, traditional RAG implementations often face pain points such as poor retrieval quality and difficulty in real-time updates.

This article will use actual cases to detail how to build a high-performance enterprise RAG system based on the Model Context Protocol (MCP) to help enterprises quickly build intelligent knowledge base applications.

Advantages of MCP compared with traditional RAG

Limitations of traditional RAG solutions

Traditional RAG implementations usually adopt a simple "Embedding+retrieval+LLM generation" architecture , which has the following limitations:

1. Tightly coupled architecture : The search logic is tightly coupled with the LLM call, making it difficult to optimize independently
2. Single search strategy : Usually only vector search is used, lacking the combination of multiple search methods
3. Lack of standardized interfaces : The interfaces between implementations vary greatly, making it difficult to achieve functional reuse
4. High maintenance cost : System upgrades require modifying a large amount of underlying code

Advantages of MCP Solutions

The MCP-based RAG system decouples knowledge retrieval services into independent modules through standardized protocols , bringing the following advantages:

1. Standardized tool calling : MCP provides unified interface specifications to reduce integration costs
2. Decoupled design : Separate model calls from business logic to facilitate independent upgrades and maintenance
3. Flexible expansion : easily add new data sources and functional modules, such as hybrid retrieval, multimodal content, etc.
4. Engineering practice friendly : in line with software engineering best practices, easy for team collaborative development
Image from dailydoseofds

Project background and requirements

The knowledge management challenges faced by modern enterprises are mainly manifested in the following aspects:

• Knowledge dispersion : Enterprise documents are distributed in multiple systems and lack a unified search portal
• Low retrieval efficiency : Traditional keyword retrieval cannot understand semantics and is difficult to accurately find the required information
• Slow knowledge update : Knowledge base updates rely on manual compilation and cannot reflect the latest situation in a timely manner
• High threshold for use : Professional terminology and complex query syntax increase the difficulty for ordinary employees to use

To address these issues, we need to design a system that meets the following core requirements:

1. Intelligent retrieval : Support natural language questions and understand the intention and context of the question
2. Knowledge automation processing : intelligent document splitting and automatic FAQ extraction
3. Flexible expansion : support for multiple data sources and model integration
4. Easy to deploy and maintain : The architecture is simple, making it easy for the technical team to master and iterate

Project Objectives

This project aims to build an enterprise RAG system based on MCP and achieve the following specific goals:

1. Technical goals

• Build knowledge base services and clients that support the MCP protocol
• Realize intelligent document segmentation and automatic FAQ extraction
• Supports decomposition of complex questions and mixed search strategies

2. Application Target

• Provide unified knowledge base management and retrieval portal
• Significantly improve the accuracy of internal knowledge retrieval (target over 90%)
• Reduce knowledge base maintenance workload by 70%
• Support intelligent processing and retrieval of various enterprise documents

Project system design and implementation

The system design of this project is based on alibabacloud-tablestore-mcp-server ^[1] .alibabacloud-tablestore-mcp-serverThe project uses Tablestore storage and MCP Server implemented in Java, which is not convenient for later expansion and iteration.
This project was transformed into Milvus storage and Python to implement MCP Server and MCP Client, and all the codes were rewritten (cursor helped a lot).
The following designs and processes arealibabacloud-tablestore-mcp-serverThanks to @xjtushilei for the open sourcealibabacloud-tablestore-mcp-server.

The MCP-based RAG system we built mainly consists of three core parts:

1. Knowledge Base Service (MCP Server) : A backend service based on the Milvus vector database, responsible for document storage and retrieval
2. Client tool (MCP Client) : The client that communicates with the MCP Server to implement knowledge base construction and retrieval functions
3. Large model integration : LLM is used to implement core functions such as document segmentation, FAQ extraction, question decomposition and answer generation.

It is mainly divided into two parts: knowledge base construction and retrieval.

1. Knowledge base construction

1. Text segmentation: Segment the text. The content after segmentation needs to ensure text integrity and semantic integrity.
2. Extract FAQ: Extract FAQ based on text content as a supplement to knowledge base retrieval to improve retrieval results.
3. Import the knowledge base: import the text and FAQ into the knowledge base, and then import the vector after embedding.

2. Knowledge Retrieval (RAG)

1. Problem decomposition: Decompose and rewrite the input problem into more atomic sub-problems.
2. Retrieval: Retrieve relevant text and FAQ for each sub-question. Use vector retrieval for text and full-text and vector hybrid retrieval for FAQ.
3. Knowledge base content screening: Screen the retrieved content and retain the content most relevant to the question for reference answers.

Compared with the traditional Naive RAG, some common optimizations have been made in knowledge base construction and retrieval, including Chunk segmentation optimization, FAQ extraction, Query Rewrite, hybrid retrieval, etc.

process

The overall architecture of this Agent is divided into three parts:

1. Knowledge base: It contains Knowledge Store and FAQ Store, which store text content and FAQ content respectively, and supports hybrid retrieval of vector and full text.
2. MCP Server: Provides read and write operations for the Knowledge Store and FAQ Store, and provides a total of 4 Tools.
3. **Function implementation part: **The functions of importing, searching and question-answering the knowledge base are completely realized through Prompt + LLM.

Project Structure

The project structure is divided into two parts:

1. milvus-mcp-client： The client implemented in Python realizes the basic capabilities of interacting with the big model, obtaining Tools through MCP Client, and calling Tools based on the feedback of the big model. The three main functions of knowledge base construction, retrieval and question and answer are realized through Prompt.
2. ** milvus-mcp-server：** The server side implemented in Python is a service implemented based on the MCP framework. It provides an interface to connect to the Milvus vector database and supports the storage and retrieval functions of the knowledge base.

Project Practice: Building an MCP-RAG System from Scratch

Next, we will comprehensively introduce how to build a RAG system based on MCP, from environment construction, service deployment to functional testing.

Environment Preparation

First, make sure you meet the following system requirements:

• Docker and Docker Compose
• At least 4 CPU, 4 GB RAM and 20 GB disk space
• Clone codegit clone -b rag_0.1.1 https://github.com/FlyAIBox/mcp-in-action.git

Deploy MCP Server

MCP Server is based on the Milvus vector database and provides storage and retrieval functions for the knowledge base.

For scenarios that require development or debugging, you can choose local deployment:

# Enter the project directory
cd  mcp-rag

# Start Milvus and dependent services first
docker compose up -d etcd minio standalone

# Create a Python virtual environment
python -m venv env-mcp-rag
source  env-mcp-rag/bin/activate  

# Install dependencies
pip install -r requirements.txt

# Start the service
python -m app.main

Core API of MCP Server

MCP Server provides four core tools to support the reading and writing of the knowledge base:

1. storeKnowledge : Store documents in the knowledge base
2. searchKnowledge : Search for similar documents in the knowledge base
3. storeFAQ : store FAQ to FAQ library
4. searchFAQ : Search for similar question and answer pairs in the FAQ database

Let's see how these APIs are actually implemented:

async def store_knowledge ( self, content:  str , metadata:  Dict [ str ,  Any ] =  None ) ->  Dict [ str ,  Any ]: 
    """Store knowledge content in Milvus"""
    # Ensure the service is ready
    await self .ready_for_connections()
    
    try :
        knowledge_content = KnowledgeContent(
            content=content,
            metadata=metadata  or  {}
        )
        self .milvus_service.store_knowledge(knowledge_content)
        return  { "status" :  "success" ,  "message" :  "Knowledge stored successfully" }
    except  Exception  as  e:
        logger.error( f"Error storing knowledge:  {e} " )
        return  { "status" :  "error" ,  "message" :  str (e)}

This code showsstoreKnowledgeTool implementation: Receive text content and metadata, create knowledge content objects, and then store them in the vector database through the Milvus service.

Implementing RAG client based on MCP Client

Next, we need to implement a RAG client to communicate with the Server through the MCP protocol to implement the knowledge base construction and query functions.

1. Knowledge base construction

• Text segmentation: Intelligently segment long texts to ensure semantic integrity
• FAQ Extraction: Automatically generate FAQ pairs from documents
• Vectorized storage: convert text snippets and FAQs into vectors and store them in Milvus

Text segmentation code example:

def _chunk_text ( self, text:  str ) ->  List [ str ]: 
    """Split the text into chunks to ensure semantic integrity"""
    chunks = []
    
    # Handle the simple case where the text is smaller than chunk_size
    if len (text) <=  self .chunk_size:
        chunks.append(text)
        return  chunks
        
    # Split text using overlapping strategy
    start =  0
    while  start <  len (text):
        # Get the end position of the chunk
        end = start +  self .chunk_size
        
        # Adjust the end position to avoid cutting off in the middle of a sentence
        if  end <  len (text):
            # Find sentence boundaries (periods, question marks, exclamation marks)
            sentence_end =  max (
                text.rfind( '. ' , start, end),
                text.rfind( '? ' , start, end),
                text.rfind( '! ' , start, end)
            )
            
            # If the end of a sentence is found, use it as the end of the chunk
            if  sentence_end > start:
                end = sentence_end +  1 # includes period
        
        # Add chunk
        chunks.append(text[start: min (end,  len (text))])
        
        # Move the start position to the next chunk, taking into account overlap
        start = end -  self .chunk_overlap
        
        # Ensure progress
        if  start >=  len (text)  or  start <=  0 :
            break
            
    return  chunks

FAQ extraction, implemented through LLM:

async def _extract_faqs ( self, text:  str ) ->  List [ Dict [ str ,  str ]]: 
    """Extract FAQ from text"""
    # Split long text into chunks
    if len (text) >  8000 :
        chunks =  self ._chunk_text(text)
        faqs = []
        for  chunk  in  chunks:
            chunk_faqs =  await self ._extract_faqs(chunk)
            faqs.extend(chunk_faqs)
        return  faqs
        
     # FAQ extracted hint template
     system_prompt = 
        """You are a professional knowledge extraction expert. Your task is to extract possible Frequently Asked Questions (FAQ) from text.
        These questions should be natural questions that a user might ask about the content of the text, and the answers should be found in the text.
        The extracted FAQ should cover the most important concepts and information in the text.

        Please follow these rules:
        1. Each FAQ consists of a question and an answer
        2. Questions should be short and to the point.
        3. Answers should be comprehensive but concise, providing relevant information from the text
        4. The number of FAQs extracted should be based on the length and content richness of the text, usually not more than 10
        5. Ensure that the extracted FAQs are not repeated
        6. Sort by importance, the most important questions should be put first

        The output format must be a JSON array, where each FAQ is an object containing "question" and "answer" fields, for example:
        [
          {
            "question": "Question 1?",
            "answer": "Answer 1"
          },
          {
            "question": "Question 2?",
            "answer": "Answer 2"
          }
        ]
        Only output JSON format, without any other text. """
    user_prompt =  f""" Extract FAQ from the following text:
    ```
    {text}
    ```
    Please extract the most relevant and valuable FAQs and return them in JSON format. """


   
   # Extract FAQ using LLM
   response =  self .llm_client.sync_generate(
       prompt=text,
       system_prompt=system_prompt,
       temperature = 0.3
   )
   
   # Parse LLM response to get FAQ
   # ...

2. Knowledge retrieval optimization

Different from traditional RAG, we introduced three optimization mechanisms in the retrieval process: question decomposition, hybrid retrieval and result screening.

• Problem decomposition: Break down complex problems into multiple sub-problems
• Hybrid search: Search the text library and FAQ library at the same time to improve the recall rate
• Result filtering: Sort and filter search results, giving priority to high-quality content

Example of problem decomposition:

async def _decompose_question ( self, question:  str ) ->  List [ str ]: 
    """Break down complex problems into simpler sub-problems"""
    system_prompt = 
    """You are an expert in problem analysis. Your task is to decompose complex problems into simpler sub-problems in order to better retrieve relevant information.

    Please follow these rules:
    1. Analyze the user’s problem and identify the different aspects or concepts it contains
    2. Break complex problems into simpler, more specific sub-problems
    3. Make sure the sub-questions cover all key aspects of the original question
    4. Provide 2-4 sub-questions, the specific number depends on the complexity of the original question
    5. Sub-questions should be clear and targeted
    6. Avoid duplication between sub-questions

    The output format must be a JSON array containing strings of all subquestions, for example:
    ["Sub-problem 1", "Sub-problem 2", "Sub-problem 3"]

    If the original question is simple enough and does not need to be decomposed, a JSON array containing only the original question is returned:
    ["Original Question"]

    Only output JSON format, without any other text. """

    user_prompt =  f"""Please break down the following question into simpler sub-questions for easier retrieval: {question} """
    
    # Generate subproblems using LLM
    response =  self .llm_client.sync_generate(
        prompt=user_prompt,
        system_prompt=system_prompt,
        temperature = 0.3
    )
    
    # Parse the response to get the sub-question list
    # ...

Key code for result screening and answer generation:

async def _filter_context ( self, question:  str , context_items:  List [ Dict [ str ,  Any ]] ) ->  List [ Dict [ str ,  Any ]]: 
    """Filter context by question relevance"""
    # Simple filtering: deduplication and truncation
    seen_contents =  set ()
    filtered_items = []
    
    # Prioritize FAQ types
    faq_items = [item  for  item  in  context_items  if  item[ "type" ] ==  "faq" ]
    knowledge_items = [item  for  item  in  context_items  if  item[ "type" ] ==  "knowledge" ]
    
    # Process FAQ items first
    for  item  in  faq_items:
        # Deduplication
        # ...
    
    # Reprocess knowledge items
    for  item  in  knowledge_items:
        # Deduplication
        # ...
    
    # Limit the total number of context items
    max_context_items =  6
    if len (filtered_items) > max_context_items:
        filtered_items = filtered_items[:max_context_items]
        
    return  filtered_items

Actual effect display

After the deployment is complete, let's take a look at the actual operation of the system:

1. Knowledge base construction

python -m app.main build --file test.md --title "Basic Introduction to RAG" --author "Enterprise Knowledge Base" --tags "LLM,RAG,Knowledge Base"

Execution Result:

2025-05-11 14:50:16 | INFO | app.knowledge_builder:build_from_text:52 - Split text into 2 chunks
2025-05-11 14:50:59 | INFO | app.knowledge_builder:build_from_text:72 - Extracted 8 FAQs from text
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:48 - Stored 2/2 chunks to knowledge base
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:50 - Extracted and stored 8 FAQs

2. Knowledge Retrieval Question Answering

python -m app.main query --question "What are the advantages and disadvantages of RAG compared to the traditional knowledge base of the enterprise?"

Execution Result:

2025-05-11 15:01:46 | INFO | app.knowledge_retriever:query:39 - Decomposed question into 4 sub-questions
2025-05-11 15:01:47 | INFO | app.knowledge_retriever:query:67 - Filtered 28 context items to 6

================================================================================
Question: What are the advantages and disadvantages of RAG compared to the traditional knowledge base of enterprises?
--------------------------------------------------------------------------------
Answer: Retrieval-augmented generation (RAG) is a technique that optimizes the output of large language models (LLMs) by integrating authoritative knowledge bases outside of training data. The core of RAG is to allow LLMs to dynamically access internal knowledge bases of specific domains or organizations before generating responses, such as real-time data sources, documents, or professional databases, without retraining the model itself. This approach significantly improves the relevance, accuracy, and usefulness of generated content by introducing external information, while retaining the flexibility and generalization capabilities of LLMs.
================================================================================

Implementation Recommendations and Best Practices

Based on actual project experience, we have summarized the following best practices:

1. Documentation Strategy

• Properly set the text segment size (1000-1500 characters) and overlap rate (200-300 characters)
• Adjust the segmentation strategy according to the document type, and treat technical documents and narrative documents differently
• Keep the original format information of the document as metadata to improve retrieval accuracy

2. Search optimization techniques

• Use hybrid search (semantic + keyword) to improve recall
• Set a reasonable number of sub-questions (2-4) in the problem decomposition stage
• Limit the total number of contexts (5-8) to avoid information overload

3. Key points of system integration

• Choose the appropriate vector model
• Design incremental indexing strategies for real-time update requirements
• Add monitoring and logging to detect and resolve issues promptly

Summary and Outlook

The RAG system based on MCP represents a new direction for knowledge base construction. Through the model context protocol, we not only solve many pain points in the traditional RAG system, but also provide enterprises with a low-cost and efficient knowledge management solution.

In the future, as large model technology advances and MCP standards improve, we can expect more innovative features to emerge:

• Support for multimodal content (images, audio, video, etc.)
• More accurate real-time knowledge update mechanism
• Adaptive retrieval optimization based on user feedback

For enterprises, now is the best time to start exploring and applying this technology. Through MCP-RAG, enterprises can fully tap the value of their own knowledge assets and provide smarter and more accurate information services to employees and customers.