DO-RAG: A Domain-Specific Question Answering Framework Using Knowledge Graph Enhanced Retrieval-Assisted Generation - Tsinghua University et al.

Written by

Silas Grey

Updated on:June-18th-2025

summary

Domain-specific question answering systems require not only generation fluency but also high factual accuracy based on structured expert knowledge. Although recent retrieval-assisted generation (RAG) frameworks improve contextual recall, they have difficulty in integrating heterogeneous data and maintaining reasoning consistency. To address these challenges, we propose DO-RAG, a scalable and customizable hybrid question answering framework that combines multi-level knowledge graph construction with semantic vector retrieval. Our system adopts a novel agent-thinking chain architecture to extract structured relations from unstructured, multimodal documents and build a dynamic knowledge graph to improve retrieval accuracy. At query time, DO-RAG fuses graph and vector retrieval results to generate context-aware responses, and then mitigates hallucinations through refinement-based resolution. Experimental evaluations on database and electronic domains show near-perfect recall and over 94% answer relevance, with DO-RAG outperforming baseline frameworks by up to 33.38%. By combining traceability, adaptability, and performance efficiency, DO-RAG provides a reliable foundation for large-scale multi-domain high-accuracy question answering.

Core Overview

Background

Research Questions
：The problem we address in this paper is how to build an efficient and accurate question answering system in specific domains such as database and electrical engineering. Although the existing retrieval-augmented generation (RAG) framework has improved in contextual recall, it has challenges in integrating heterogeneous data and maintaining reasoning consistency.
Research Difficulties
：The research difficulties of this problem include: capturing the complex relationships between entities in technical manuals and multimodal resources, manually building and maintaining high-quality domain-specific knowledge graphs, and reducing factual errors in the generation process.
Related Work
：Related work on this problem includes: early rule-based methods, advanced methods integrating structured knowledge, the application of large language models (LLMs) in general domains, and the application of RAG framework to improve fact consistency. However, existing methods still have limitations in handling complex queries in specific domains.

Research Methods

This paper proposes DO-RAG, a framework for domain-specific question answering that addresses the shortcomings of existing methods through knowledge graph-enhanced retrieval and generation. Specifically,

Multi-level knowledge graph construction : First, we design and implement a hierarchical agent extraction pipeline to process text, tables, code snippets, and images to automatically build and update a knowledge graph that captures entities, relations, and attributes.
Hybrid Retrieval Fusion : We develop a unified mechanism to combine graph traversal with semantic search at query time, ensuring that all relevant, structured information informs the cues of a Large Language Model (LLM).
Fact-based Hallucination Mitigation : We introduce a post-generation refinement step to cross-validate the initial LLM output with the knowledge graph and iteratively correct inconsistencies, significantly reducing factual errors.
Modular design : The framework supports seamless component exchange of multiple LLMs and retrieval modules and direct extension to new domains without retraining.

Experimental design

To evaluate the DO-RAG framework, SunDB (a distributed relational database management system) was chosen as the domain of expertise. The experimental design includes the following aspects:

Hardware and Software
：The experiment was conducted on a high-performance Ubuntu workstation equipped with 64GB RAM, NVIDIA A100 80GB GPU and 1TB hard disk. The experimental software includes Lang-Fuse, Redis, MinIO and ClickHouse.
Dataset
Two datasets are used for evaluation: the primary SunDB dataset and the secondary Electrical dataset. Each dataset includes 245 expert-curated questions and their ground-truth answers, annotated with source locations for precise verification.
Indicators and tools
：The evaluation focuses on four core metrics: answer relevance (AR), context recall (CR), context precision (CP), and fidelity (F). These metrics are calculated using RAGAS, DeepEval, and Lang-Fuse.
Baseline comparison
: The external baseline comparison includes FastGPT, TiDB.AI and dify.AI; the internal baseline comparison compares DeepSeek-R1 and DeepSeek-V3 with and without knowledge graph integration.

Results and Analysis

External baseline comparison : Among the tested language models, SunDB.AI performs best among all baselines, outperforming FastGPT, TiDB.AI, and Dify.AI by 1.70%, 24.02%, and 17.72%, respectively.
Internal baseline comparison : Knowledge graph integration has a significant impact on DeepSeek-R1 and DeepSeek-V3. After integrating the knowledge graph, the contextual recall of both reaches 1.000, and the answer relevance and contextual precision of DeepSeek-V3 are improved by 5.7% and 2.6% respectively.
Domain-specific performance : In both the SunDB and Electrical domains, contextual recall values are at or near 1.0. Variations in answer relevance, contextual precision, and faithfulness reveal specific strengths between models.

Overall conclusion

This paper introduces DO-RAG, a retrieval-augmented generation framework for domain-specific question answering. DO-RAG transforms unstructured multimodal domain data into a dynamic multi-level knowledge graph through an agent-chained mind extraction pipeline, and combines graph traversal with semantic vector search to retrieve rich contextual information. A post-generation refinement step further enhances factual accuracy. Empirical results in the database and electrical engineering domains show that DO-RAG performs well in context recall and answer relevance, improving by up to 33.38% over existing baseline frameworks. These findings demonstrate the effectiveness of DO-RAG in providing robust and high-precision question answering in specific domains, unifying structured knowledge representation and generative reasoning, and providing a solid foundation for scalable and adaptive information systems.

Paper Evaluation

Advantages and innovations

Multi-stage knowledge graph construction
: We designed and implemented a hierarchical, agent-based extraction pipeline that automatically builds and updates a dynamic knowledge graph that captures entities, relations, and attributes.
Hybrid search fusion
: We develop a unified mechanism to combine graph traversal with semantic search at query time, ensuring that all relevant, structured information informs hints to a large language model (LLM).
Fact-Based Hallucination Relief
: We introduce a post-generation refinement step that significantly reduces factual errors by cross-validating the initial LLM output with the knowledge graph and iteratively correcting inconsistencies.
Plug-and-play modularity
: The framework supports seamless component exchange of multiple LLM and retrieval modules and direct extension to new domains without retraining.
High accuracy
: On expert-curated benchmarks in the database and electrical engineering domains, DO-RAG achieves near-perfect context recall and over 94% answer relevance, outperforming existing RAG platforms by up to 33.38%.

Shortcomings and reflections

Reliance on language models
: The framework relies on language models, and despite the foundation of knowledge graphs, creative models such as DeepSeek-R1 occasionally introduce hallucinations.
Dataset limitations
: The dataset is limited to 245 questions in each domain and may not capture rare or edge case queries, limiting the generalization ability.
Calculation overhead
: The computational overhead of multi-agent extraction and hybrid retrieval, although optimized, is still significant in real-time updates in large-scale deployments.
Future Work
Future work will focus on enhancing hallucination mitigation through more rigorous hint engineering, prioritizing factual consistency. Expanding the dataset to include diverse and edge case queries will improve robustness. Exploring distributed processing and adaptive caching mechanisms will enhance scalability and reduce latency.

Key questions and answers

Question 1: What are the unique designs of the DO-RAG framework in building knowledge graphs?

The DO-RAG framework designs a hierarchical agent extraction pipeline for the construction of multi-level knowledge graphs. Specifically, the pipeline includes four specialized agents that operate at different levels of abstraction:

Senior Agent
: Identify structural elements (such as chapters, sections, paragraphs).
Mid-level agent
: Extract domain-specific entities such as system components, APIs, and parameters.
Low-level proxy
: Capture fine-grained operational relationships, such as thread behavior or error propagation.
Covariate Proxy
: Add properties to existing nodes (such as default values, performance impact).

This multi-level agent extraction approach ensures the dynamic construction and updating of the knowledge graph, captures multi-granular information of entities, relations, and attributes, avoids redundancy, and simplifies the graph structure by synthesizing summary nodes.

Question 2: How does the DO-RAG framework combine graph traversal and semantic vector search to perform hybrid retrieval?

Graph traversal : After the user submits a question, DO-RAG first uses an LLM-based intent analyzer to structurally decompose the question and generate subqueries to guide retrieval from the knowledge graph. Then, it retrieves related nodes by semantic similarity and performs multi-hop traversal to expand the retrieval scope and generate structured domain-specific context.
Semantic Vector Search : The context obtained from the graph traversal is used to rewrite and disambiguate the question, generating a more specific and unambiguous query. This query is then encoded into a dense vector and used to retrieve semantically similar text blocks from a vector database.
Results Integration : Finally, DO-RAG integrates all relevant information sources (the original user query, its rewritten version, the knowledge graph context, the retrieved text chunks, and the user interaction history) into a unified prompt structure and passes it to the generation pipeline.

This approach ensures that all relevant, structured information informs the Large Language Model (LLM) cues, thereby improving retrieval accuracy and generation contextual richness.

Question 3: How does the DO-RAG framework perform fact-based hallucination mitigation when generating answers?

The DO-RAG framework introduces a post-generation refinement step for fact-based hallucination mitigation:

Initial Generation : First, an initial prompt is used to guide the LLM to generate answers while explicitly avoiding unsupported content.
Refinement Prompt : The generated answer is reconstructed and verified through a refinement prompt to ensure the factual consistency and clarity of the answer.
Condensation stage : The refined answers are adjusted through a compression stage to ensure the coherence and simplicity of the answers.
Follow-up question generation : To enhance user engagement and simulate expert guidance, DO-RAG also generates follow-up questions based on the refined answers.

Additionally, if the system cannot find enough evidence, the model returns “I don’t know” to maintain reliability and prevent hallucinations. This refinement step significantly reduces factual errors and improves the accuracy and reliability of answers.