Knowledge Agent Retrieval: Five Architecture Transition Points that Make RAG Burst with Wisdom

Written by
Clara Bennett
Updated on:July-15th-2025
Recommendation

In-depth exploration of the five key points of the RAG system architecture upgrade, revealing the engineering wisdom behind the intelligent agent.

Core content:
1. The challenges and shortcomings of the RAG system in dealing with complex problems
2. Five important steps to upgrade the knowledge retrieval architecture
3. Specific technical implementation and case analysis, showing the art of problem decomposition

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

As a veteran who has built an enterprise-level RAG system from scratch, I know the confusion of developers when facing complex problems: "knowing that they should be optimized, but not knowing where to start." This article will use the most straightforward language to dismantle the necessary path for upgrading traditional RAG to intelligent Agent. After reading it, you will find that those seemingly profound concepts are the crystallization of wisdom gained through hard work in engineering practice.


1. What is the problem? Let’s start with the real failure

Last year, we took on a case involving an e-commerce client. When their customer service system used RAG to handle user inquiries, they encountered the following problem:

"Comparison of the differences in waterproof performance and sports modes between smart watches recommended for Nike and Puma "

Traditional RAG behaves like an honest but rigid student:

  1. Throw the whole question into the search engine
  2. Catch 20 product manuals
  3. Generate a high-level feature comparison

As a result, users complained that the answers "were like product manuals, with no business insights." What was the problem?

This exposes three major weaknesses of traditional architecture:

  1. The more complex the question, the worse the retrieval accuracy (our tests show that when the question contains more than 3 entities, the accuracy drops by 57%)
  2. Lack of verification mechanism , wrong documents contaminate the final answer like a virus
  3. Response speed and quality cannot be achieved at the same time . Adding verification will slow down the response, and pursuing speed will lead to distortion.


2. Five Steps to Upgrade Knowledge Retrieval Architecture


Step 1: Problem deconstruction - the art of breaking down the whole into parts

Imagine you are writing a paper. It is definitely difficult to write the final draft directly. The smart way is to make an outline first and write it in chapters. Similarly, complex problems should also be broken down:

Original question  →  List of sub-questions :

  1. Nike Customized Core Parameter Requirements
  2. Testing standards for the Puma partnership project
  3. Sales channel characteristics of the two customers
  4. The industry benchmark for waterproof performance
  5. Market feedback on sports mode

Technical implementation :

  • Use LLM to do "problem triage", similar to a doctor asking for details when consulting a patient
  • Each sub-question is searched independently to avoid conceptual confusion
  • Weight allocation mechanism: important sub-problems are processed first

# Pseudocode example: Dynamic problem splitting
def decompose_question (question) : 
    prompt =  f"""
    Please break down the following question into 3-5 independent sub-questions:
    Original question: {question}
    Output format: JSON array
    """

    return  call_llm(prompt)

Effect verification : In a customer case, problem analysis increased the document hit rate from 31% to 68%.


Step 2: Parallel Verification - The Wisdom of Multithreading

Suppose you are a restaurant owner and a table of guests comes in and orders 10 dishes. There are two ways to do it:

  • Let a chef do it in order (traditional rag)
  • Assign to multiple chefs to do it simultaneously (parallel verification)

Obviously the second one is faster. In engineering, we do this:

  1. Each sub-problem has an independent processing thread
  2. In each thread:
    1. Query expansion (synonyms, related terms)
    2. Multi-way recall (vector search + keyword search)
    3. Document credibility scoring

Tips to avoid pitfalls :

  • Control the number of concurrent connections to avoid overwhelming the database
  • Set a timeout mechanism to prevent a single sub-problem from blocking the entire process
  • Use memory sharing to avoid repeated retrieval


Step 3: State Management - The Secret to Avoiding Chaos

Imagine you are playing a strategy game and operating multiple battlefields at the same time:

  • Main Base Status (Original Question)
  • Progress of each sub-battlefield (sub-problem processing status)
  • Global technology tree (domain knowledge graph)

In the code we implement this:


class BattleState : 
    main_question: str   # Main question
    sub_questions: dict   # sub-question status pool
    knowledge_graph: dict   # Dynamic knowledge graph

class SubQuestion : 
    query: str   # Current query
    docs: list   # retrieved documents
    validation: dict   # validation results

Design points :

  • Hierarchical isolation: sub-problems do not communicate directly with each other
  • Incremental update: Like automatic game archiving, every step can be traced back
  • Garbage collection: automatically clean up the memory occupied by completed tasks


Step 4: Streaming output - let users know the progress

Think back to when you are downloading a file. Why is the progress bar important? Because it:

  1. Proof that the system is working
  2. Managing user expectations
  3. Provide interruption basis

In the knowledge agent, we design three levels of streaming feedback:

  1. Instant confirmation (within 200ms):
  • "Analyzing the demand differences between Nike and Puma..."
  1. Process display :
  • "Found 3 Nike technical documents and 2 Puma test reports"
  1. Progressive Generation :
  • "First, let's look at the waterproof performance: Nike requires 5ATM vs Puma's 3ATM..."

Technical implementation :

  • Websocket long connection
  • Message priority queue
  • Result Cache Prefetch


Step 5: Self-evolution - the secret of getting smarter the more you use it

We added a "wrong question book" mechanism to the system:

  1. Automatic evaluation after each question and answer session:
    1. Did the user ask follow-up questions?
    2. Was the answer accepted?
    3. How about manual scoring?
  2. Classification and storage of problem case library
  3. Automatically fine-tune the model every week

After applying this mechanism in the medical field, the quarterly average accuracy rate increased by 7.3%


3. Practical suggestions for developers


1. Don’t over-design

  • Implement the core link first, then gradually optimize
  • Each sub-module evaluates ROI (return on investment) separately
  • Case: In the early stage, we conducted in-depth verification for all documents, and later found that only the first 3 documents needed to cover 80% of the requirements.


2. Monitoring is more important than algorithms

Four core indicators must be established:

Indicator name
Calculation method
Warning threshold
Sub-question timeout rate
Number of timed tasks/total number of tasks
>5%
Document contamination rate
Wrong documents lead to degraded answer ratio
>10%
Streaming interruption rate
Percentage of incomplete transfer sessions
>2%
Knowledge update delay
New document effective time
>1 hour


3. Choose the right framework

Taking LangGraph as an example, its three major advantages are:

  • Visual debugging : turning abstract state flows into visible flowcharts
  • Atomic rollback : failure of a sub-problem does not affect the whole
  • Ecosystem integration : seamless integration with the LangChain toolchain

But be careful:

  • The learning curve is steep, it is recommended to start with submodules and gradually replace them
  • Deep customization requires reading source code
  • The quality of community plugins varies and needs to be rigorously evaluated


4. Future battlefield: smarter knowledge processing

The current architecture can solve 80% of the complex problems, but the real challenge lies in:

  • Ambiguous intent handling : When the user doesn’t even know what they are asking
  • Cross-document reasoning : It is necessary to connect the hidden information of multiple documents
  • Real-time knowledge update : How to make new knowledge effective within 1 minute

What we are exploring:

  1. Hybrid retrieval : combining semantic search with graph traversal algorithms
  2. Cognitive chain verification : Make each reasoning step explainable and verifiable
  3. Edge computing deployment : running lightweight agents locally on user devices


Conclusion: The true meaning of an architect

Good architecture is not about pursuing technological fashion, but about accurately grasping "where to be complex". The essence of the five transition points is to translate human thinking patterns into machine-executable processes. The next time you face a complex system, you might as well ask yourself:

"If I were facing this problem, how would I solve it?"

This may be the starting point of intelligent design.