Understanding GraphRAG (RAG + Knowledge Graph) in one article

Written by
Clara Bennett
Updated on:June-27th-2025
Recommendation

Explore how GraphRAG innovates the combination of knowledge graphs and language models, and improves complex query and multi-hop reasoning capabilities.

Core content:
1. GraphRAG technical definition and core advantages
2. Complex query processing: community clustering and cross-document topic analysis
3. Multi-hop reasoning case: graph path analysis and explanation generation

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

GraphRAG (Graph-based Retrieval-Augmented Generation) is an upgraded version of the retrieval-augmented generation (RAG) technology. By combining the knowledge graph with the large language model (LLM) , it solves the limitations of traditional RAG in processing complex queries, multi-hop reasoning, and cross-document semantic associations . Its core goal is to capture entities, relationships, and global semantics in the data through structured knowledge graph representation, thereby improving LLM's ability to understand and generate private or untrained data .

1. GraphRAG

What is GraphRAG (Graph-based Retrieval-Augmented Generation) ?GraphRAG is an advanced technology that combines knowledge graphs with retrieval-augmented generation (RAG). It aims to enhance the reasoning ability of large language models (LLMs) through structured knowledge and address the limitations of traditional RAG incomplex queries and multi-hop reasoning.
  • Complex queries: Use community clustering (such as the Leiden algorithm) to generate hierarchical summaries, support cross-document topic analysis (such as "AI research trends in the past five years"), achieve global semantic understanding, and solve complex queries.
  • Multi-hop reasoning: answering questions that require multiple connections (e.g., “how does event A indirectly lead to result C”) through graph paths.

Case 1: Complex query (community clustering + cross-document topic analysis)

Divide the massive AI literature data into highly cohesive and low-coupling communities, each of which represents a research topic. Based on community clustering, analyze the correlation and evolution trend between different topics.

  • Input: Literature data in the field of AI in the past five years.
  • Construct a literature network: Use literature as nodes and citation relationships as edges to construct a weighted graph.
  • Community clustering: Use the Leiden clustering algorithm to output a hierarchical community structure.
    • First layer: basic technology (machine learning, deep learning)
    • Second layer: application areas (natural language processing, computer vision)
    • The third layer: subdivision direction (generative AI, multimodal learning)
  • Generate a hierarchical summary :
    • Extract high-frequency keywords and subject terms for each community and generate community-level summaries.
    • Aggregate community summaries to form a global hierarchical summary.
  • Topic association analysis: Calculate the topic similarity between communities (such as cosine similarity) and construct a topic association graph.
  • Trend forecasting: Identify the rise, fall, and convergence of themes based on time series analysis.
  • Output: "AI Research Trends in the Past Five Years" report
    • 2020-2021: Deep learning model optimization (such as Transformer improvements)
    • 2022-2023: Large language models (such as the GPT series) explode
    • 2024: The rise of multimodal AI and embodied AI
Case 2: Multi-hop reasoning (graph path analysis)

Through the knowledge graph path, answer the complex question of "how event A indirectly leads to result C".

  • Question: “How does event A (the introduction of the Transformer architecture in 2019) indirectly lead to outcome C (the release of ChatGPT in 2023)?”
  • Building a knowledge graph:
    • Nodes: events, technologies, fields, institutions, etc.
    • Edges: causal relationships, citation relationships, cooperative relationships, etc.
  • Graph Path Analysis:
    • Event A →  Event B(In 2020, Google released the BERT model to verify the effectiveness of Transformer).
    • Event B →  Event C (OpenAI trained GPT-3 based on Transformer in 2022 and released ChatGPT in 2023).
  • Explanation generation: Convert the path into a natural language description, for example: The introduction of the Transformer architecture (A) promoted the development of pre-trained language models (B), and ultimately gave birth to ChatGPT (C).

2. Knowledge Graph

How to build a knowledge graph ? The core of knowledge graph construction is to transform unstructured data into a semantic network, and finally form a queryable and reasonable knowledge graph through entity recognition, relationship extraction and graph fusion. This process requires the combination of NLP technology, graph database and domain knowledge, and is suitable for scenarios such as intelligent question answering and enterprise decision support.

1. The core of knowledge graph construction: transforming unstructured text into structured knowledge network

The core task of knowledge graph construction is to convert massive amounts of unstructured text data (such as news, documents, web content, etc.) into structured knowledge graphs. In this process, nodes represent entities (such as people, places, events, concepts, etc.), and edges represent semantic relationships between entities (such as "diabetes → insulin → side effects"). Through this structured representation, the knowledge graph can clearly show the relationship between entities and provide support for subsequent semantic reasoning, information retrieval, and intelligent question and answer.

2. Knowledge Graph Construction Process: Entity Recognition, Relation Extraction and Graph Fusion

  1. Entity recognition: Identify key entities (such as "diabetes", "insulin", "side effects") from the text and use them as nodes in the knowledge graph.
    Example: Extract the entities "diabetes", "insulin", and "hypoglycemia" from "diabetic patients using insulin may cause hypoglycemia".
  2. Relationship extraction: Determine the semantic relationships between entities (such as "treat", "cause", "belong to", etc.) and use them as edges to connect related nodes.
    Example: Based on the above text, construct the relationships "diabetes → treatment → insulin" and "insulin → trigger → hypoglycemia".
  3. Graph fusion: Merge repeated entities or relations from different texts to ensure the consistency of the graph.
    Example: If another text mentions "Side effects of insulin include hypoglycemia," merge it with the existing relationship to form a more complete picture.

3. Typical Case of Knowledge Graph: Building a Diabetes Knowledge Graph

  1. Data sources: medical literature, encyclopedia entries, and patient forums.
  2. Entities: diabetes, insulin, hypoglycemia, blood sugar monitoring, dietary control.
  3. Relationship: diabetes → treatment → insulin, insulin → trigger → hypoglycemia, diabetes → management → blood sugar monitoring.
Through this structured representation, the knowledge graph can not only answer direct questions such as "What are the common treatments for diabetes?" , but also support complex reasoning (such as "What factors may affect the blood sugar levels of diabetic patients?") , thereby improving the semantic understanding ability of intelligent systems.