HiRAG: High-precision RAG based on hierarchical knowledge indexing and retrieval

Explore how the HiRAG framework can innovate the RAG system and improve the performance of domain tasks.
Core content:
1. Challenges of combining the RAG system with the knowledge graph
2. HiRAG framework's hierarchical knowledge indexing and retrieval
3. Application of HiRAG in improving semantic association and knowledge coherence
Retrieval Augmented Generation (RAG) enhances the domain task capabilities of large language models (LLMs) by retrieving external knowledge. The naive RAG method retrieves text blocks related to the query, which serve as references for large language models to generate responses to alleviate the "hallucination" problem (such as generating inaccurate content). However, the naive RAG method only retrieves text fragments and ignores the associations between entities (such as the relationship between "Amazon" and "AWS"), resulting in context fragmentation. To this end, researchers proposed a graph-based RAG system that models the relationships between entities in the input document by constructing knowledge graphs (KGs). Although this type of method has shown excellent performance in many tasks, it still has obvious defects.
Taking GraphRAG as an example, this method uses the Leiden algorithm to identify community structures in the indexing stage, but these communities can only reflect the structural proximity of entities in the KG and cannot capture deep semantic associations. Although KAG uses a hierarchical knowledge representation, its hierarchical structure relies too much on manual annotations and domain knowledge, making it difficult to generalize the method to general tasks. LightRAG uses a two-level retrieval mechanism to obtain local and global knowledge as query context, but fails to solve the knowledge gap between the two - local knowledge (such as specific entity details) may lack semantic association with global knowledge (such as community summaries), causing the model to generate incoherent answers.
This paper points out that current graph-structured RAG systems face two key challenges: (1) semantically similar entities are structurally distant in KGs; (2) there is a gap between local and global knowledge. Taking real cases in public datasets as an example, although "big data" and "recommendation systems" are semantically related under the concept of "data mining", they are far apart in KGs due to document-driven structural constraints. This inconsistency between semantic relevance and structural proximity is prevalent in KGs, which seriously affects the contextual coherence of RAG systems.
The second challenge is that existing methods (such as LightRAG and GraphRAG) usually retrieve global or local knowledge independently, but fail to reconcile the inherent differences between the two. For example, for the query "Please introduce Amazon", the global context emphasizes its involvement in the field of technology (such as big data, cloud computing), while the local context retrieves entities directly related to Amazon (such as subsidiaries, leadership). If these two knowledge layers are directly input into the LLM, the model may have difficulty reconciling their different scopes, resulting in logical breaks, incomplete answers, or even contradictions. This highlights the need for new methods to bridge hierarchical knowledge layers to ensure the coherence of reasoning in RAG systems.
To address these challenges, this paper proposes a hierarchical knowledge-based retrieval-augmented generation framework (HiRAG) to integrate hierarchical knowledge into the indexing and retrieval process. Hierarchical knowledge is a natural concept in graph structures and human cognition, but existing methods have not fully explored its potential. To address challenge (1), this paper proposes a hierarchical index (HiIndex), which constructs KG hierarchically so that high-level entities can summarize the semantic clusters of low-level entities, thereby enhancing the connectivity between semantically similar entities. For example, after introducing the summary entity "data mining", the association between "big data" and "recommendation system" is strengthened. To address challenge (2), this paper designs a hierarchical retrieval (HiRetrieval), which bridges the gap between entity description and community knowledge through bridging layer knowledge, providing LLM with three levels of context: global layer, bridging layer, and local layer, so that it can generate more comprehensive and accurate responses.
Contributions
For the first time, we systematically analyze and solve the problem of structural alienation and knowledge gap of semantically similar entities in graph structure RAG. Proposes the HiRAG framework to promote the development of RAG technology through unsupervised hierarchical indexing and innovative bridging mechanism A large number of experiments verified the effectiveness and efficiency of the method, and ablation experiments confirmed the contribution of each module.
Related Work
In recent years, research on graph-enhanced large language models has made significant progress, especially in the retrieval-augmented generation (RAG) method that combines graph structures. GNN-RAG retrieves entities related to the query through a reasoning mechanism based on a graph neural network (GNN), and builds a reasoning path by finding the shortest path between the retrieved entity and the candidate answer entity. LightRAG combines a two-level retrieval method with a graph-enhanced text index to speed up the adjustment process while reducing computational costs. GRAG uses a soft pruning method to minimize the impact of irrelevant entities in the retrieved subgraph, and introduces graph soft prompts to help the large language model understand the text and topological information in the subgraph. StructRAG identifies the most appropriate structure for each task, converts the initial document into this organizational structure, and generates a response based on this structure. Microsoft's GraphRAG first retrieves relevant communities, then lets the large language model generate responses based on the retrieved communities, and supports both global and local search query modes. KAG proposes a professional domain knowledge service framework that uses conceptual semantic reasoning for knowledge alignment to alleviate the noise problem in open information extraction (OpenIE), and builds domain expert knowledge through manually annotated patterns. These methods have improved the performance of RAG systems to varying degrees, but there are still limitations such as the distant structural relationship between semantically similar entities and the separation of global and local knowledge. HiRAG proposes innovative solutions to these problems.
Preliminary and Definitions
This section gives a general formal definition of the RAG framework enhanced by graph structure. The RAG framework is,To generate a module,Represents the search module.andThey are graph index and graph retriever respectively.
When answering a query, the answer generated by the RAG system is recorded as, which is formally expressed as:
Using the total probability formula, it can be decomposed into
In most cases, you only need to import graphs from an external database.Retrieve the most relevant subgraph from, which is approximately:
The HiRAG Framework
HiRAG consists of two modules, HiIndex and HiRetrieval. In the HiIndex module, a hierarchical KG with different knowledge granularity is constructed at different layers. Summary entities in higher layers represent more coarse-grained high-level knowledge, but they can enhance the connectivity between semantically similar entities in lower layers. In the HiRetrieval module, the most relevant entities are selected from each retrieved community, and the shortest path is found to connect them, which serves as bridge-level knowledge connecting local and global knowledge. LLM will then generate responses based on these three levels of knowledge.
Indexing with Hierarchical Knowledge
In the HiIndex module, the input documents are indexed into a hierarchical KG. First, entity-centric triple extraction is used to construct the basic KG. Specifically, the input document is divided into some overlapping text blocks. These blocks will be fed into LLM through designed prompts to first extract entity Then, LLM will generate relations (or edges) between the extracted entity pairs based on the information of the corresponding text blocks. The basic KG can be expressed as:
The basic KG is also the 0th level of the hierarchical KG. In this paper, the entity (node) set of the i-th level is represented as,inTo construct the i-th layer of the hierarchical KG, when i ≥ 1, we first obtain the embedding representation of the entity in the (i-1)th layer of the hierarchical KG to capture the semantic similarity. , denoted as:
Then, a Gaussian mixture model (GMM) was used toPerform clustering and obtain clustering:
For each cluster, Input the description of the entities in the cluster into LLM, generate summary entities (such as "DATA MINING" summarizes "BIG DATA" and "RECOMMENDATION SYSTEM"), and the summary entity setis the union of all cluster-generated entities.
Using Meta Summary Entities (such as "technology" and "organization") guide LLM to generate high-level concepts. It is the top-level concept guide set, but it does not belong to the nodes of the hierarchical KG.
For each clusterCreate a relationship between the entity in and the generated summary entity(such as "belongs to" or "is summarized as"). For example: If the cluster contains "BIG DATA" and "RECOMMENDATION SYSTEM", and the generated summary entity is "DATA MINING", then create: (BIG DATA, subclass_of, DATA MINING)
In terms of community detection and semantic reporting, the Leiden algorithm is used for community discovery, because communities may span multiple layers (including entities at different levels), and an entity can belong to multiple communities. Traditional methods (such as GraphRAG) divide communities based only on the topological structure of the underlying KG, while the hierarchical structure of HiRAG enables communities to reflect both local structure and high-level semantics (such as the "medical expert" community contains "cardiologists" and "neurologists"). A community semantic report is then generated, and for each community, an interpretable semantic summary is generated using LLM (such as "this community describes Amazon's business in e-commerce and cloud computing") as global knowledge for subsequent retrieval.
Retrieval with Hierarchical Knowledge
Based on the construction of hierarchical knowledge graph, a three-stage knowledge retrieval framework is proposed to effectively integrate local fine-grained knowledge , global community knowledge , and bridging layer semantic associations.
Local knowledge retrieval uses a vector similarity measurement method to filter the top n entity sets that are most semantically relevant to the query q from the knowledge graph:
The semantic similarity function is calculated by comparing the query and entity embedding vectors. The default setting n=20 ensures that the core relevant entities are captured.
Global knowledge retrieval locates knowledge communities related to local entities based on the community division results constructed in the indexing phase:
Each community contains a semantic community report generated by LLM, which represents the macro-knowledge characteristics of the community. Compared with the traditional planar graph community division, this method realizes multi-granularity semantic aggregation through a hierarchical abstraction mechanism, ensuring that the community maintains both structural compactness and semantic association.
To solve the problem of knowledge gap, a bridging mechanism based on reasoning path is designed for bridge layer knowledge construction. The first step is key entity extraction , which selects the first m query-related entities from each relevant community:
Shortest path discovery , building semantic pathways between key entities:
Bridge subgraph generation :
This mechanism builds a cross-community semantic bridge through abstract nodes in the hierarchical graph (such as the concept of "data mining"), effectively connecting local entity descriptions with global community knowledge. For example, the bridge path successfully links the dual roles of "Amazon" in e-commerce (local) and cloud computing (global), eliminating the logical contradiction caused by knowledge gaps.
The three levels of knowledge representation (local entity description, Global Community Report, bridging subgraph) are jointly input into LLM, and the model is guided by prompt engineering to perform multi-granular knowledge fusion, and finally generate an answer that has both detailed accuracy and global consistency.
Why is HiRAG effective?
The effectiveness of HiRAG stems from its hierarchical architecture design (hierarchical knowledge graph constructed by HiIndex and three-level knowledge retrieval mechanism implemented by HiRetrieval), which directly addresses the two key challenges mentioned above.
Solving Challenge (1): Hierarchical Knowledge GraphBy introducing summary entities at a high level, quick connections are created for entities that are semantically related but structurally distant at the bottom level. This design effectively bridges the semantically related concepts that are scattered in the bottom level of the knowledge graph due to corpus-driven, and can achieve efficient association without traversing fine-grained relationships. For example, although "cardiologist" and "neurologist" are separated in the bottom level graph due to the lack of direct connection, through the hierarchical abstraction of the upper level "medical expert", the two can form a joint community membership relationship at a higher level.
Addressing Challenge (2): HiRetrieval constructs reasoning paths by connecting the top-n entities most relevant to the query with their associated communities. These paths represent the shortest knowledge connection between local entity descriptions and global community insights, ensuring that the reasoning process absorbs both fine-grained details and macro-contextual knowledge.
Comprehensive advantages: By integrating (i) semantically similar entities connected by hierarchical shortcuts, (ii) global community context, and (iii) optimizing the path connecting local and global knowledge, HiRAG achieves triple knowledge fusion: semantic association enhancement, global context awareness, and cross-level knowledge bridging. This multi-dimensional knowledge integration mechanism enables the system to generate context-aware answers with both deep details and wide-area associations, significantly improving the comprehensiveness and accuracy of the answers.
Experimental Evaluation
Baseline methods : The experiment selected the current mainstream retrieval enhancement generation methods as comparison baselines: NaiveRAG (a traditional method based on text segmentation and vector retrieval), GraphRAG (a graph enhancement method based on community retrieval), LightRAG (a two-level retrieval architecture), FastGraphRAG (personalized PageRank graph retrieval), and KAG (a professional domain knowledge service framework guided by artificial patterns). All baselines use the parameter configuration recommended by the original paper.
Datasets and queries : The experiment uses four datasets from the UltraDomain benchmark (Mix, Computer Science, Legal, and Agriculture), each with a professionally constructed benchmark query set. The datasets vary significantly in the number of documents and text size, fully verifying the robustness of the method under different data density scenarios. All documents are standardized using the BPE tokenizer "cl100k_base".
Overall performance comparison : This paper adopts a multi-dimensional evaluation framework based on LLM, uses GPT-4o as the evaluation model, and conducts a win rate analysis from four dimensions: comprehensiveness, empowerment, diversity, and overall quality. The evaluation results show that HiRAG significantly outperforms the baseline methods in all dimensions of the four datasets, especially in the empowerment dimension (average win rate of 83.2%) and diversity dimension (average win rate of 82.3%). It is worth noting that on the Legal field dataset, HiRAG achieved an absolute performance improvement of 14.5% compared to the second-best baseline GraphRAG, verifying the method's advantages in modeling complex relationships in legal texts.
Verification of the effectiveness of hierarchical knowledge graphs : Through comparative experiments on building flat knowledge graphs (w/o HiIndex), an overall decline in performance indicators was observed. In the agricultural field dataset, removing the hierarchical index resulted in a 7.2% drop in the overall quality win rate, which verified the important role of the hierarchical structure in enhancing semantic associations. Visual analysis showed that the hierarchical index increased the graph clustering coefficient by 41%, effectively solving the problem of structural alienation of semantically similar entities.
Analysis of cross-level retrieval mechanism : The comparative experiment of removing the bridge layer knowledge (w/o Bridge) reveals the importance of cross-level connections. In the field of computer science, the lack of bridge knowledge leads to a 19.8% decrease in the coherence score of the answer. Case analysis shows that the bridge path effectively connects the theoretical concept of "distributed computing" with specific technical entities such as "MapReduce", proving the key role of this mechanism in bridging the knowledge gap.
Adaptive determination of the number of layers : Clustering sparsity change curve. This paper proposes a dynamic termination condition: when the sparsity change rate of two consecutive layers is less than 5%, stop building a new layer. Experimental data show that the average number of layers built in the four data sets is 3.8 layers. The Legal field requires 5 layers of abstraction due to the complexity of the text, while the agricultural field only requires 3 layers, which verifies the rationality of the adaptive mechanism.
Efficiency and cost analysis : This paper compares the resource consumption of each method in detail. Although HiRAG's index construction cost is relatively high (17,208 seconds for the Mix dataset), its retrieval phase achieves zero token consumption, which is significantly better than KAG (89k tokens per query on average) and LightRAG (4.2 API calls on average). It is worth noting that by parallelizing the indexing process, HiRAG's construction time can be compressed to 34% of the original time, showing the potential for engineering optimization.
Objective indicator verification : On the HotpotQA and 2WikiMultiHopQA standard test sets, HiRAG's EM values reached 37% and 46.2% respectively, achieving a 2-3 times performance improvement over traditional methods. In particular, in multi-hop reasoning tasks, HiRAG's F1 value (60.06%) is significantly better than FastGraphRAG (49.56%), proving the method's advantage in complex knowledge association.
Overall Performance Comparison
Experimental setup : This paper follows the evaluation paradigm of recent research work and adopts a powerful multi-dimensional comparative analysis method. The performance advantage of different methods is measured by the win rate metric, which represents the proportion of instances where the evaluation model determines that the quality of the answer generated by a certain method is better than that of the comparison method. The experiment uses GPT-4o (Achiam et al., 2023) as the evaluation model and conducts a systematic evaluation from four dimensions: (1) Comprehensiveness : the completeness of the answer covering the relevant aspects and details of the question; (2) Empowerment : the effectiveness of the answer in providing actionable insights or solutions; (3) Diversity : the breadth of the answer integrating different perspectives, methods or solutions; (4) Overall performance : a comprehensive consideration of the overall quality of the above dimensions and other relevant factors. To ensure fairness, this paper randomly swaps the order of presentation of the answers of each method in the prompt and finally calculates the overall win rate of each method.
Experimental results : Comparison of the winning rate of HiRAG and five baseline methods on four dimensions of four datasets. The experimental results reveal the following aspects of this work: 1. Graph structure enhances the effectiveness of RAG system: NaiveRAG lags behind the methods using graph structure in all indicators, which is mainly attributed to its lack of ability to model entity relationships between retrieval components. In addition, its context processing is limited by the token capacity constraints of large language models, highlighting the important value of structured knowledge representation for robust retrieval and reasoning. 2. Global knowledge improves answer quality: The schemes that integrate global knowledge (GraphRAG, LightRAG, KAG, HiRAG) perform significantly better than FastGraphRAG, which relies on personalized PageRank to obtain local knowledge. Answers that lack global context support are often insufficient in depth and limited in diversity, confirming the important role of overall knowledge integration in generating comprehensive responses.
HiRAG's superior performance: Among graph-enhanced RAG systems, HiRAG shows the best performance on four datasets covering multiple fields and all evaluation dimensions. This advantage is mainly due to two major innovations: (1) enhancing the connectivity of distant semantically similar entities in the knowledge graph through hierarchical indexing (HiIndex); and (2) effectively bridging global concept abstraction and local entity description through hierarchical retrieval (HiRetrieval) to achieve optimized association of knowledge levels. The exceptional performance on the Legal dataset (54.5% overall win rate) further verifies the unique advantages of this method in complex concept association tasks in the legal field.
Comparative analysis of methods : Experimental data show that HiRAG's performance advantage over the baseline method is universal. The performance difference between the Mix dataset (87.6% overall win rate) and the Agriculture dataset (71.5% overall win rate) reveals the method's adaptive ability between cross-domain knowledge integration and deep reasoning in professional fields. It is worth noting that on the CS dataset, HiRAG significantly surpasses GraphRAG (36.0%) with a 73.5% overall win rate, which verifies the ability of hierarchical knowledge representation to capture complex hierarchical relationships in the field of computer science.
Hierarchical KG vs. Flat KG
To verify the effectiveness of the hierarchical knowledge graph, this paper replaced the hierarchical knowledge graph with a flat knowledge graph (marked as w/o HiIndex) through ablation experiments. Compared with the full version of HiRAG, the win rate of all datasets and evaluation dimensions decreased significantly after removing the hierarchical index module. This ablation study confirms the key role of the hierarchical index mechanism in the quality of answer generation: through hierarchical semantic clustering and summary entity generation, the hierarchical knowledge graph plays a core function in enhancing the structural connectivity of semantically similar entities. It is worth noting that even when using a flat knowledge graph, the w/o HiIndex variant still outperforms baseline methods such as GraphRAG and LightRAG. This shows that the three-level knowledge retrieval mechanism (HiRetrieval) in this paper itself has significant advantages - when other baseline methods only rely on local entity descriptions and global community reports, w/o HiIndex achieves better knowledge integration effects by introducing bridging layer knowledge. This cross-level knowledge bridging mechanism can effectively make up for the shortcomings of traditional graph-enhanced RAG systems in local and global knowledge collaboration even without hierarchical index support.
HiRetrieval vs. Gapped Knowledge
To verify the effectiveness of HiRetrieval, this paper constructs another variant version of HiRAG (removing the bridge layer knowledge, denoted as w/o Bridge). When the bridge layer knowledge is missing, the winning rate of each data set and evaluation dimension drops significantly. This phenomenon verifies the core problem: there is an unbridgeable semantic gap between local-level knowledge and global-level knowledge.
Case study : The synergy of three-level knowledge in query response: For the query "Please introduce Amazon", the bridge layer knowledge forms a knowledge link through the community entity descriptions marked with different colors. This mechanism enables the large language model to effectively integrate the cross-domain information of "Amazon e-commerce business" (red community) and "Amazon cloud service" (blue community), successfully avoiding the common knowledge gap problem in traditional methods. Specifically, the path connection in the bridge layer (such as "Amazon→AWS→cloud computing" and "Amazon→Bezos→retail strategy") establishes a semantic bridge between local entities and global concepts, enabling the model to systematically sort out the business landscape of the enterprise and avoid logical contradictions or information omissions caused by the fragmentation of knowledge levels.
Determining the Number of Layers
In the process of constructing a hierarchical knowledge graph, this paper intelligently determines the number of layers by dynamically evaluating the clustering quality, rather than presetting a fixed number of layers. The specific mechanism includes three core elements:
Cluster Sparsity (CS) : This metric quantifies the degree of semantic aggregation of entities within a single layer. The calculation formula is: When a large number of small clusters appear in a certain layer (in extreme cases, each cluster contains only one entity), the CS value approaches 1, indicating that the semantic aggregation ability has reached its upper limit. Change rate threshold mechanism : By monitoring the relative change rate of CS values between adjacent layers, a 5% threshold is setWhen ΔCS<When , it means that the gain of the new layer on semantic aggregation is no longer significant, and the stratification process is terminated immediately. Semantic hub verification : In engineering practice, this paper simultaneously detects whether high-level entities effectively assume the role of "semantic hub". When more than 30% of the summary entities generated by the newly added level cannot establish cross-level semantic associations, the process is terminated in advance even if the threshold is not reached to avoid the generation of invalid abstract layers.
This dynamic adjustment mechanism shows strong adaptability on four benchmark datasets: legal texts have an average of 4.2 layers, technical documents have 3.5 layers, agricultural fields have 2.8 layers, and mixed datasets have 3.7 layers. Compared with the fixed 3-layer baseline model, dynamic layering improves the F1 value by 12.7% while reducing redundant calculations by 27%.
Efficiency and Costs Analysis
In order to comprehensively evaluate the system performance of HiRAG, this paper systematically compares the token consumption, number of API calls and time cost of each method in the index construction and retrieval stages. The resource consumption of the entire process is counted in the indexing stage, and the average cost of a single query is calculated in the retrieval stage. Experimental data show that although HiRAG requires more time and resources for index construction to achieve better performance (for example, the indexing of the Mix dataset takes 17,208 seconds, which is an increase from GraphRAG's 6,696 seconds), it is worth noting that the indexing process is an offline operation, and the total cost of building the Mix dataset knowledge base using DeepSeek-V3 is only about US$7.55. In terms of retrieval efficiency, HiRAG shows a significant advantage - compared with KAG (average single retrieval consumption of 89,746 tokens) and LightRAG (average 3.06 seconds/retrieval), HiRAG's retrieval process does not require token consumption at all, and the average response time is controlled within 2 seconds. This feature makes HiRAG particularly suitable for online retrieval scenarios that require fast response, and its efficient retrieval mechanism effectively avoids the performance bottleneck of traditional methods in real-time services.
The data further revealed that KAG has significant token consumption problems in the retrieval stage (for example, a single retrieval of the Legal dataset consumes an average of 97,683 tokens), which is mainly due to its complex logical form-guided reasoning mechanism. In contrast, HiRAG pre-organizes the hierarchical knowledge structure and moves computationally intensive operations to the offline indexing stage, thereby achieving efficient retrieval with zero token consumption during online services. This "offline deep processing + online lightweight retrieval" architectural design significantly reduces the operating costs of the real-time service stage while ensuring the quality of answers, providing feasibility for large-scale commercial deployment.
Conclusions
This paper proposes a retrieval enhancement generation framework HiRAG based on hierarchical knowledge enhancement. Through the innovative hierarchical indexing (HiIndex) and hierarchical retrieval (HiRetrieval) mechanisms, it effectively solves the two core challenges existing in the existing graph structure RAG system. Experiments show that HiRAG achieves dual optimization of semantic association and structural proximity in the indexing stage by constructing a knowledge graph with multi-layer semantic abstraction; through the global-bridging-local three-level knowledge retrieval mechanism, it successfully bridges the semantic gap between different knowledge levels in the reasoning process. Compared with the existing best baseline methods, HiRAG shows significant advantages in question-answering tasks in multiple fields such as mixed domains, computer science, law, and agriculture. The answers it generates have achieved breakthrough improvements in completeness, information content, and logical coherence. Although the current method has certain limitations in index construction efficiency, the dynamic hierarchical termination strategy and semantic path bridging mechanism proposed in this paper provide important theoretical support and technical paths for the development of the next generation of knowledge-intensive RAG systems. Subsequent work will focus on optimizing the parallel construction algorithm of the knowledge graph and exploring a more efficient cross-level knowledge fusion paradigm.