The limitations of traditional RAG are broken! How can three lightweight agents work together to make the question-answering system more accurate?

The AI question-answering system has undergone a revolutionary upgrade, with lightweight agents working together to accurately respond to diverse questions.
Core content:
1. Diversified and complex scenario challenges faced by the question-answering system
2. Limitations of traditional RAG technology and improvement solutions
3. Division of labor and cooperation mechanism of three lightweight agents
Recently, the company has put forward higher requirements for the accuracy and quality of answers in AI-related projects, especially in the field of question-answering systems. The questions raised by users are not limited to basic questions, but may also involve diverse and complex scenarios. Just before the end of get off work last Friday, the project manager reported to me that a customer supported by the previous project team, in addition to querying information in the private knowledge base, also asked the following types of questions when using the question-answering system:
Questions that can be directly answered by a large language model (LLM) : such as general knowledge or common sense questions. Non-knowledge base related verbal expressions : such as "thank you", "goodbye" and other social expressions. Complex problems that require multiple steps of reasoning : for example, problems involving logical reasoning or contextual association.
However, the RAG technology used by the previous project team has certain limitations: no matter what questions the user asks, the system defaults to retrieving the answer from the knowledge base, and then handing it over to the LLM to generate the answer. For non-knowledge base questions (such as verbal expressions), the team handles them by setting fixed responses at the prompt level. Although this method can partially solve the problem, its design is not elegant enough and cannot effectively deal with complex problems that require multi-step reasoning. The project manager hopes that I can come up with a more optimized solution to meet the above challenges. To this end, I proposed a lightweight multi-agent solution to him to optimize the entire RAG process without modifying the retriever and LLM.
Architecture diagram
Define three agents
We designed three specialized agents - Reasoning Router , Information Filter, and Decision Maker - to facilitate the communication between the retriever and the large language model (LLM). Each lightweight agent plays a different role and collaboratively manages various aspects of the RAG (Retrieval-Augmented Generation) pipeline through targeted instructions. This unified design ensures efficient coordination while maintaining the simplicity of edge deployment. The three agents are responsible for evaluating whether retrieval is needed, generating effective queries, and selecting information suitable for LLMs.
Reasoning Router
Determines the best reasoning strategy for a given problem from a high-level perspective. Based on the current state (i.e., the problem), it selects an action in at most two steps:
Determine the necessity of retrieval : If the output is [No Retrieval], the question is directly processed by the LLM, which uses its internal knowledge to generate an answer. Determine problem complexity : If [Retrieval] is output, it will also evaluate whether the problem requires complex reasoning.
For simple problems, generate a For complex problems, the output is [Planning], which will trigger a multi-step reasoning strategy that requires coordination among multiple agents.
The following are some examples :
Example 1 (no retrieval required) : Input question: q = "What is the capital of France?" Output: [No Retrieval] Explanation: This question is a common sense question and can be answered directly by LLM without searching. Example 2 (simple question, search required) : Input question: q = "What is the population of Paris in 2023?" Output: [Retrieval] <population of Paris 2023> Explanation: This problem requires retrieving the latest data from an external knowledge base. Example 3 (complex problem, requiring multi-step reasoning) : Enter question: q = "How does the economic policy of Country A affect its trade relations with Country B?" Output: [Planning] Explanation: This problem involves multi-step reasoning and needs to be broken down into multiple sub-goals and solved step by step.
Information Filter
Used to process and filter retrieved information to identify content suitable for LLM. Its state space includes the question, the retrieved documents, and the current reasoning goal (if running in [Planning] mode).
Decision Maker
Determine the best action in the [Planning] strategy based on the current state. Its state space includes the problem, the roadmap generated by LLM, and the documents accumulated in the reasoning history. Based on the current state, intelligently select actions to evaluate progress and decide on the next action.
How do agents collaborate?
The Direct Answering Strategy and Single-pass Strategy have been introduced in the definition of Reasoning Router, corresponding to the outputs [No Retrieval] and [Retrieval] respectively.
The Multi-Step Reasoning Strategy corresponds to the [Planning] output of the Reasoning Router. This strategy is designed to solve complex problems that require LLM to generate a high-level roadmap and multiple retrieval-filtering cycles, and implements iterative information collection and reasoning through the following three stages:
Generate Roadmap : LLM breaks down complex problems into a series of structured sub-goals, providing high-level guidance to the agent. These sub-goals define the steps to solve the problem and the type of information required. Iterative Retrieval and Filtering : Based on the generated roadmap, the agent gradually collects relevant information through multiple retrieval-filtering cycles. In each cycle, the inference router determines whether the current subgoal needs to be retrieved, the information filter extracts relevant documents, and the decision maker evaluates the progress and decides the next action. Synthesis and Answer Generation : After completing all sub-goals, LLM synthesizes the collected information to generate the final answer. This process ensures that complex questions can be answered comprehensively and accurately.
Through these three strategies, our multi-agent system can adaptively handle problems of different complexity:
Direct Answering Strategy : Applicable to general knowledge questions and provides immediate responses. Single-pass Strategy : Efficiently handle fact-based questions and obtain answers through a single search-filter cycle. Multi-Step Strategy : Solve complex problems through guided iterative reasoning.
How to design prompt
Reasoning Router
The inference router will evaluate and output the corresponding actions according to the following steps:
Assessment Questions :
Analyze the specificity , complexity , and clarity of the question to determine whether searching or planning is required. Determine whether the problem falls within the LLM's existing knowledge or whether external information support is required. Decision category : If the problem is complex and requires planning , output: [Planning]. If the question requires specific information (such as recent events or a niche area outside the scope of LLM knowledge), output: [Retrieval] 'YOUR QUERY HERE'. If the question can be answered directly by LLM , output: [No Retrieval]. Output format : No retrieval required : [No Retrieval] Need to retrieve : [Retrieval] Planning required : [Planning] (applicable to complex problems) Single-pass Strategy : q : Enter question. retrieved documents : relevant documents retrieved from the retrieval system. State space: S2 = {q, retrieved documents} Multi-step Reasoning Strategy : q : Enter question. retrieved documents : relevant documents retrieved from the retrieval system. current objective : The current inference objective (in multi-step inference mode). State space: S2 = {q, retrieved documents, current objective} Single search strategy :
Question : What is the population of Paris in 2023?
Retrieved Documents : Multiple documents containing demographic data for Paris.
Information filter output :
Thought: Document 1 contains the most recent population data for Paris (2023). Action: [Document 1]
Question : How does the economic policy of Country A affect its trade relations with Country B? Current objective : Analyze the economic policies of Country A. Retrieved documents : Multiple documents about Country A's economic policies. q : Enter question. Accumulated Documents : The currently accumulated documents (from the previous retrieve-filter cycle). Roadmap : The reasoning roadmap generated by LLM (in a multi-step reasoning strategy). [Retrieval] : Requests additional retrieve-filter cycles, generating subqueries to obtain more relevant information. Applicable scenario : The currently accumulated documents are not sufficient to solve all sub-goals. [LLM] : Pass all the documents accumulated so far to the LLM to generate the final answer. Applicable scenario : The currently accumulated documents are sufficient to solve all sub-goals. Further search required :
Question : How does the economic policy of Country A affect its trade relations with Country B?
Current status :
Roadmap : The next step is to analyze Country B’s trade policy. Accumulated Documents : Contains documents on Country A’s economic policies. Decision maker output :
Thought: Additional information on Country B's trade policy is required to complete the analysis. Action: [Retrieval]<trade policy of Country B>
Generate the final answer :
Question : How does the economic policy of Country A affect its trade relations with Country B? Current status : Accumulated Documents : Contains documents on Country A's economic policies and Country B's trade policies. Roadmap : All sub-goals have been completed. Decision maker output :
Thought: Sufficient information has been accumulated to generate the final answer. Action: [LLM]
The corresponding prompt can be designed as follows:
You are an intelligent assistant responsible for evaluating whether a given question requires more information through retrieval or planning to arrive at an accurate answer. You have access to a large language model (LLM) for planning or answering questions, and a retrieval system to provide information relevant to the query.
instruction:
1. **Evaluate the question**: Evaluate whether your existing knowledge based on the LLM can provide a precise answer. Consider the specificity, complexity, and clarity of the question.
2. **Decision categories**:
- If the question is complex and requires a planning phase prior to searching, your response should be:
[Planning]
- If the question requests specific information that you believe the LLM does not possess, or concerns recent events or a niche topic that is beyond the scope of the LLM's knowledge, please respond in the following format:
[Retrieval] 'YOUR QUERY HERE'
- If you think the LLM can answer the question without additional information, please respond:
[No Retrieval]
3. **Focus on evaluation**: Avoid answering questions directly. Focus only on determining if retrieval or planning is required.
Reasoning about router status
Now, please address the following questions:
Question: {question}
Reasoning about all possible operation outputs of the router
% For situations where no retrieval is required
[No Retrieval]
% For situations where retrieval is required
[Retrieval] <query content> (for simple questions)
[Planning] (for complex problems)
Information Filter
The state space of the information filter varies depending on the strategy currently adopted by the system:
It will parse the document and output it in the following format:
Thought: < Analysis of each document >
Action: [ < selected document ID > ]
Example
2. Multi-step reasoning strategy :
You are an intelligent assistant responsible for analyzing the retrieved documents based on the given question and the goal of the current step. Your role is to determine the relevance of each document to the question and the specified goal.
instruction:
Analyze relevance: Evaluate whether each document meets the goal of the current retrieval step and contains a direct answer to the question.
Thinking process: Provide a brief analysis for each document, taking into account both the answer content and the retrieval goal.
Filter documents: After your thought process, generate an index list of documents indicating which documents to keep.
Information filtering status
Now, please address the following questions:
Objective of the current step: {objective} (only applicable in [Planning] mode)
Question: {question}
Documents: {documents}
Information filtering output
Thinking: <Analysis of each document>
Action: [<selected document ID>]
Decision Maker
The decision maker’s state space is: S3 = {q, Accumulated Documents, Roadmap}
Its main responsibility is to evaluate the reasoning progress based on the current state and decide whether further retrieval is needed or to directly generate the final answer. Its action space includes the following two possible operations:
Its output format is as follows:
Thought: <Analysis of current progress and goals>
Action: { [Retrieval] <subquery content >, or [LLM] }
Example
Through the dynamic evaluation and decision-making of the decision maker, the system can flexibly adjust the retrieval and generation processes in a multi-step reasoning strategy to ensure that complex questions are answered comprehensively and accurately.
The corresponding prompt is as follows:
You are an intelligent assistant responsible for determining the next appropriate action based on the existing documents, plan, and question provided. You have access to a large language model (LLM) to answer questions, and a retrieval system is used to collect additional documents. Your goal is to decide whether to write a query to retrieve relevant documents, or use the LLM to generate a comprehensive answer based on the existing documents and plan.
instruction:
1. **Evaluate existing documentation**: Evaluate whether the existing documentation is sufficient to answer the question.
2. **Follow the Plan**: Understand the next steps outlined in the plan.
3. **Decision categories**:
- If the existing documents are insufficient and additional searches are required, please respond:
[Retrieval] 'YOUR QUERY HERE'
- If the existing documentation is sufficient to answer the question, please respond:
[LLM]
4. **Focus on action**: Don’t answer questions directly; focus on identifying the appropriate next actions based on existing documents, plans, and questions.
Decision maker status
Now, please deal with the following issues: Existing documents: {accumulated documents}
Roadmap: {roadmap}
Question: {question}
Output for decision makers
Think: [your analysis of the current situation (requires searching for additional information or using LLM to answer)]
Action: [Decision based on analysis ([Retrieval] <subquery content> or [LLM] )]
Summarize
By introducing a multi-agent collaborative RAG optimization solution, the limitations of traditional RAG technology in dealing with diverse problems have been successfully resolved. However, this design is more dependent on the capabilities of the model, especially the accuracy of the Reasoning Router. Once the Reasoning Router makes an incorrect judgment, the final result may not be as expected. Therefore, if resources permit, you can consider using a small 7B model for fine-tuning to improve the performance of the Reasoning Router.