A simple framework for solving the RAG question-answering hallucination problem based on KG generated corpus enhancement - Walk & Retrieve

Written by

Clara Bennett

Updated on:June-18th-2025

Walk&Retrieve is based on the knowledge graph and uses graph traversal and knowledge representation to generate the corpus of zero-shot RAG. It solves the hallucination problem of the RAG system. The framework idea is relatively simple, and the core point is the corpus generation of zero-shot RAG . Let's take a look at it for reference.

Corpus Generation

In the framework, corpus generation is the core step of the method. This stage extracts relevant information from the knowledge graph and converts it into a text format suitable for LLM processing. Corpus generation includes the following steps: graph-based traversal, knowledge representation, and indexing.

1. Graph-based traversal

Random walk : Random walk is a random process that starts from a node and selects neighbor nodes of the current node for movement with uniform probability each time.
in, Representation Node The number of neighbors. For each node ,generate The length of the bar is The random walk path of The final corpus is the set of random walk paths of all nodes.

Advantages : Simple and easy to use, suitable for large-scale graphs.
Disadvantages : May generate duplicate paths and noise.
Breadth First Search - BFS Walk : BFS is a graph traversal algorithm that starts from the root node and visits its neighbor nodes layer by layer. For each root node , build a hierarchical structure, where the nodes at each level represent the shortest path distance to the root node. Then, traverse in hierarchical order to ensure that each node is visited only once.
in,, is the maximum depth.

Advantages : Avoids repeated paths and generates more diverse walking paths.
Disadvantages : High computational complexity, especially when traversing deep layers.

2. Knowledge Representation

LLM requires text input and needs to convert the extracted graph traversal path into a natural language description. Using predefined prompt templates, the walk path of each node is converted into a natural language sentence. For example, for a random walk path , you can generate something like " Through relationships Connect to ,and Through relationships Connect to "Sentence.

3. Index

Each walking path Convert to vector representation and calculate the global representation of each node as the concatenation of all its walk path vectors. Store the nodes and their corresponding walk path vectors for fast retrieval during the inference phase.

Search Questions and Answers

This stage is not the focus and is the same as traditional RAG, including query encoding, similarity retrieval (k-nearest neighbor search), context integration, and answer generation.

Experimental performance

Performance on MetaQA: Walk&Retrieve-BFS performs best in answer accuracy and reduction of false answers, with a relative improvement of 38.64%. Other KG-based RAG systems have high accuracy but more false answers. Walk&Retrieve-BFS excels in truthfulness and reduction of non-responses on 1-hop, 2-hop, and 3-hop questions.

Performance on CRAG: The Walk&Retrieve variant outperforms LLM-only and text-based RAG in answer accuracy, while being comparable to them in false answer and non-response rates. Due to the higher complexity of CRAG, the performance of Walk&Retrieve drops slightly, but still shows good robustness.