Deep learning! Building a RAG multi-agent research tool based on LangGraph.

Written by
Iris Vance
Updated on:June-27th-2025
Recommendation

In today's information explosion, how can we quickly and accurately acquire knowledge? This article introduces a RAG multi-agent tool based on LangGraph, which can efficiently handle complex problems, integrate multi-source information, and obtain accurate answers through iterative steps.

Core content:
1. The limitations of traditional question-answering systems and the advantages of multi-agent RAG
2. Functions of multi-agent RAG: routing and tool use, planning sub-steps, reflection and error correction, sharing global state
3. System architecture and document processing, building the "brain" of intelligent question answering

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

In today's era of information explosion, it is particularly important to acquire knowledge quickly and accurately. Although traditional question-answering systems can handle some simple questions, they often seem powerless when faced with complex questions. To solve this pain point, we have developed a LangGraph-based RAG multi-agent tool that can efficiently handle complex questions, integrate multi-source information, and obtain accurate answers through iterative steps. Today, let's take a deeper look at this powerful tool.

1. Introduction: From simple RAG to intelligent multi-agent RAG

At the beginning of the project development, we found that the traditional "simple RAG" method had many shortcomings. Simple RAG cannot decompose complex problems, can only process queries at a single level, and cannot deeply analyze each step and draw a unified conclusion; it lacks the ability to handle hallucinations (i.e., the model generates wrong information) or errors, and cannot correct errors through verification steps; in addition, the simple RAG system cannot dynamically use tools, call external APIs, or interact with databases according to workflow conditions.

To address these issues, we introduced a multi-agent RAG research system. The agent-based framework enables the following functions:

  • Routing and tool usage : The routing agent can classify the user's query and direct the process to the appropriate node or tool. For example, it can determine whether the document needs a full summary, whether more detailed information is needed, or whether the question is out of scope.
  • Planning sub-steps : Complex queries often need to be broken down into smaller, more manageable steps. Starting from a query, the system can generate a series of execution steps to explore different aspects of the query and reach a conclusion. For example, if the query requires comparing two different parts of a document, an agent-based approach can recognize this comparison requirement, retrieve the two sources separately, and merge them into a comparative analysis in the final response.
  • Reflection and Error Correction : Beyond simple response generation, the agent-based approach can also add a validation step to account for potential hallucinations, errors, or responses that fail to accurately answer the user’s query. Additionally, it is able to incorporate human-involved self-correction mechanisms to incorporate human input into the automated process. This capability makes agent-based RAG systems a more robust and reliable solution for enterprise applications, where reliability is critical.
  • Shared global state : Agent workflows share a global state, simplifying state management across multiple steps. This shared state is critical to maintaining consistency across different stages of the multi-agent process.

II. Project Overview: Building the “Brain” of Intelligent Question and Answering

1. System architecture diagram

Our system consists of two core parts: the researcher subgraph, which is responsible for generating different queries for retrieving and re-ranking the top-k documents in the vector database; and the main graph, which contains the main workflow, such as analyzing user queries, generating the steps required to complete the task, generating responses, and checking hallucinations through a human involvement mechanism.

2. Document processing and vector database construction

1. Document parsing

For complex PDF documents, especially those with complex layouts, it is crucial to choose the right parsing tool. Many libraries are not accurate enough when dealing with PDFs with complex page layouts or table structures. To this end, we adopted Docling, an open source library that can efficiently parse documents and export the content to the required format. Docling supports reading and exporting Markdown and JSON formats from multiple common document formats such as PDF, DOCX, PPTX, XLSX, images, HTML, AsciiDoc and Markdown. It has a comprehensive understanding of PDF documents, including table structure, reading order and page layout, and also supports OCR for scanned PDFs.

The following is a code example of using Docling to convert PDF to Markdown format:

from  docling.document_converter  import  DocumentConverter

logger.info( "Starting document processing." )
converter = DocumentConverter()
markdown_document = converter.convert(source).document.export_to_markdown()

2. Vector database construction

We use Chroma to build a vector database, store sentences as vector embeddings, and search in the database. We store the persistent database in the local directory "db_vector". Using OpenAI's embedding model, we convert the document list into vectors and store them in Chroma.

The following is the code for building a vector database:

from  langchain_community.vectorstores  import  Chroma
from  langchain_openai  import  OpenAIEmbeddings

embd = OpenAIEmbeddings()

vectorstore_from_documents = Chroma.from_documents(
    documents=docs_list,
    collection_name= "rag-chroma-google-v1" ,
    embedding=embd,
    persist_directory= 'db_vector'
)

(III) Main picture construction and status management

One of the core concepts of LangGraph is state. Each graph execution creates a state that is passed as nodes of the graph execute, and updates the internal state after each node execution.

We define two classes: Router and GradeHallucinations, which are used to store the classification results of user queries and the presence or absence of hallucinations in responses, respectively. Based on these states, we construct the input state (InputState) and the agent state (AgentState), where AgentState contains the classification of the user query, the list of steps in the research plan, the list of retrieved documents that the agent can refer to, and the binary score of hallucinations.

The following is the definition code of the state class:

from  pydantic  import  BaseModel, Field
from  typing  import  Literal, TypedDict

class Router (TypedDict) : 
    """Classify user query."""
    logic: str
    type: Literal[ "more-info""environmental""general" ]

class GradeHallucinations (BaseModel) : 
    """Binary score for hallucination present in generation answer."""
    binary_score: str = Field(description= "Answer is grounded in the facts, '1' or '0'" )

(IV) Detailed explanation of the workflow

1. Step 1: Analyze and route queries

This step updates the Router object in the agent state with a type variable containing one of the values ​​“more-info”, “environmental”, or “general”. Based on this information, the workflow will be routed to the appropriate node, such as “create_research_plan”, “ask_for_more_info”, or “respond_to_general_query”.

Here is the implementation code:

async def analyze_and_route_query (  
    state: AgentState, *, config: RunnableConfig
)
 -> dict[str, Router]:

    """Analyze the user's query and determine the appropriate routing."""
    model = ChatOpenAI(model=GPT_4o, temperature=TEMPERATURE, streaming= True )
    messages = [
        { "role""system""content" : ROUTER_SYSTEM_PROMPT}
    ] + state.messages
    logging.info( "---ANALYZE AND ROUTE QUERY---" )
    response = cast(
        Router,  await  model.with_structured_output(Router).ainvoke(messages)
    )
    return  { "router" : response}

2. Step 2: Create a research plan

If the query classification returns "environmental", the user's request is relevant to the document and the workflow reaches the "create_research_plan" node, whose function is to create a step-by-step research plan for answering environmental-related queries.

Here is the implementation code:

async def create_research_plan (  
    state: AgentState, *, config: RunnableConfig
)
 -> dict[str, list[str] | str]:

    """Create a step-by-step research plan for answering an environmental-related query."""
    class Plan (TypedDict) : 
        """Generate research plan."""
        steps: list[str]

    model = ChatOpenAI(model=GPT_4o_MINI, temperature=TEMPERATURE, streaming= True )
    messages = [
        { "role""system""content" : RESEARCH_PLAN_SYSTEM_PROMPT}
    ] + state.messages
    logging.info( "---PLAN GENERATION---" )
    response = cast(Plan,  await  model.with_structured_output(Plan).ainvoke(messages))
    return  { "steps" : response[ "steps" ],  "documents""delete" }

3. Step 3: Conduct research

This step will take the first step from the research plan and call the researcher subgraph to execute the research. The researcher subgraph will return a series of document fragments, which we will further process in the subsequent steps.

Here is the implementation code:

async def conduct_research (state: AgentState)  -> dict[str, Any]:  
    """Execute the first step of the research plan."""
    result =  await  researcher_graph.ainvoke({ "question" : state.steps[ 0 ]})   # graph call directly
    docs = result[ "documents" ]
    step = state.steps[ 0 ]
    logging.info( f"\n {len(docs)}  documents retrieved in total for the step:  {step} ." )
    return  { "documents" : result[ "documents" ],  "steps" : state.steps[ 1 :]}

4. Step 4: Researcher subgraph construction

The researcher subgraph includes two key steps: query generation and document retrieval. The query generation step generates multiple search queries based on the questions in the research plan to help answer the questions. The document retrieval step uses hybrid search and Cohere re-ranking technology to retrieve relevant documents from the vector database.

Here is the code generated by the query:

async def generate_queries (  
    state: ResearcherState, *, config: RunnableConfig
)
 -> dict[str, list[str]]:

    """Generate search queries based on the question."""
    class Response (TypedDict) : 
        queries: list[str]

    logger.info( "---GENERATE QUERIES---" )
    model = ChatOpenAI(model= "gpt-4o-mini-2024-07-18" , temperature= 0 )
    messages = [
        { "role""system""content" : GENERATE_QUERIES_SYSTEM_PROMPT},
        { "role""human""content" : state.question},
    ]
    response = cast(Response,  await  model.with_structured_output(Response).ainvoke(messages))
    queries = response[ "queries" ]
    queries.append(state.question)
    logger.info( f"Queries:  {queries} " )
    return  { "queries" : response[ "queries" ]}

Here is the code for document retrieval and re-ranking:

def _setup_vectorstore ()  -> Chroma: 
    """Set up and return the Chroma vector store instance."""
    embeddings = OpenAIEmbeddings()
    return  Chroma(
        collection_name=VECTORSTORE_COLLECTION,
        embedding_function=embeddings,
        persist_directory=VECTORSTORE_DIRECTORY
    )

# Create base retrievers
retriever_bm25 = BM25Retriever.from_documents(documents, search_kwargs={ "k" : TOP_K})
retriever_vanilla = vectorstore.as_retriever(search_type= "similarity" , search_kwargs={ "k" : TOP_K})
retriever_mmr = vectorstore.as_retriever(search_type= "mmr" , search_kwargs={ "k" : TOP_K})

ensemble_retriever = EnsembleRetriever(
    retrievers=[retriever_vanilla, retriever_mmr, retriever_bm25],
    weights=ENSEMBLE_WEIGHTS,
)

# Set up Cohere re-ranking
compressor = CohereRerank(top_n= 2 , model= "rerank-english-v3.0" )
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=ensemble_retriever,
)

5. Step 5: Check if it is completed

This step determines whether the research process is complete by checking if there are any remaining steps in the research plan. If there are, the workflow will return to the "conduct_research" node to continue execution; if there are no remaining steps, it will enter the "respond" node to generate the final response.

Here is the implementation code:

def check_finished (state: AgentState)  -> Literal["respond", "conduct_research"]: 
    """Determine if the research process is complete."""
    if  len(state.steps  or  []) >  0 :
        return "conduct_research" 
    else :
        return "respond" 

6. Step 6: Generate Response

This step generates the final response to the user's query based on the documents retrieved during the research process and the conversation history. It uses language models to integrate all relevant information into a comprehensive answer.

Here is the implementation code:

async def respond (  
    state: AgentState, *, config: RunnableConfig
)
 -> dict[str, list[BaseMessage]]:

    """Generate the final response to the user's query."""
    model = ChatOpenAI(model= "gpt-4o-2024-08-06" , temperature= 0 )
    context = format_docs(state.documents)
    prompt = RESPONSE_SYSTEM_PROMPT.format(context=context)
    messages = [{ "role""system""content" : prompt}] + state.messages
    response =  await  model.ainvoke(messages)
    return  { "messages" : [response]}

7. Step 7: Check for hallucinations

This step analyzes the response generated by the language model to determine whether it is supported by the retrieved document facts and gives a binary score. If the score indicates that the response may contain hallucinations, the workflow will be interrupted and the user will be prompted to decide whether to regenerate the response or end the process.

Here is the implementation code:

async def check_hallucinations (  
    state: AgentState, *, config: RunnableConfig
)
 -> dict[str, Any]:

    """Analyze the response for hallucinations."""
    model = ChatOpenAI(model=GPT_4o_MINI, temperature=TEMPERATURE, streaming= True )
    system_prompt = CHECK_HALLUCINATIONS.format(
        documents = state.documents,
        generation=state.messages[ -1 ]
    )
    messages = [
        { "role""system""content" : system_prompt}
    ] + state.messages
    logging.info( "---CHECK HALLUCINATIONS---" )
    response = cast(GradeHallucinations,  await  model.with_structured_output(GradeHallucinations).ainvoke(messages))
    return  { "hallucination" : response}

8. Step 8: Manual Approval (Human Participation)

If the response of the language model is not supported by facts and may contain hallucinations, the workflow will pause and hand control to the user. The user can choose to re-execute only the last generation step without restarting the entire workflow, or choose to end the process. This human involvement mechanism ensures that the user is in control of the entire process and avoids unnecessary loops or unexpected operations.

Here is the implementation code:

def human_approval (state: AgentState) : 
    _binary_score = state.hallucination.binary_score
    if  _binary_score ==  "1" :
        return "END"
    else :
        retry_generation = interrupt(
            {
                "question""Is this correct?" ,
                "llm_output" : state.messages[ -1 ]
            }
        )
        if  retry_generation ==  "y" :
            print( "Continue with retry..." )
            return "respond"
        else :
            return "END"

3. Practical test: the powerful capabilities of multi-agent RAG

To verify the performance of the system, we used an annual report on Google's environmental sustainability strategy for testing. This report contains rich data and complex table structures, which is very suitable for testing the system's multi-step processing capabilities and document parsing functions.

1. Complex Question Test

We asked a complex question: "Retrieve the PUE efficiency values ​​for the second data center in Singapore in 2019 and 2022, and the regional average CFE value for Asia Pacific in 2023."

The system successfully breaks down this problem into multiple steps and generates the corresponding query:

  • “Find the PUE efficiency values ​​for our second data center in Singapore in 2019 and 2022.”
  • “Find the regional average CFE value for Asia Pacific in 2023.”

By retrieving and rearranging the documents, the system finally gave an accurate answer:

  • The PUE efficiency value for the second data center in Singapore in 2019 was not provided, but the PUE in 2022 was 1.21.
  • The regional average CFE for Asia Pacific is 12% in 2023.

2. Comparative test with ChatGPT

To further verify the reliability of the system, we submitted the same problem to ChatGPT. It turned out that the value returned by ChatGPT was wrong, and hallucinations were clearly present. This shows that simple language models may generate inaccurate information when dealing with complex problems, and our multi-agent RAG system can effectively avoid this through the hallucination check step.

4. Technical Challenges and Prospects: The Future of Multi-agent RAG

Although multi-agent RAG has achieved significant performance improvements, it still faces some challenges in practical applications:

  • Latency issues : As the complexity of agent interactions increases, response times may become longer. How to strike a balance between speed and accuracy is a key challenge.
  • Evaluation and Observability : As multi-agent RAG systems become more complex, continuous evaluation and observability become essential.

Overall, multi-agent RAG is a major breakthrough in the field of artificial intelligence. It combines the power of large language models with autonomous reasoning and information retrieval, introducing a new standard of intelligence and flexibility. As artificial intelligence continues to advance, multi-agent RAG will play a fundamental role in various industries and revolutionize the way we use technology.