GraphRAG cost is 10% off, KET-RAG multi-granularity indexing framework is open source

KET-RAG: The new framework significantly reduces the cost of knowledge retrieval and opens a new era of efficient generation.
Core content:
1. The challenges of the existing Graph-RAG system between cost and retrieval quality
2. The three major innovations of the KET-RAG framework: knowledge graph skeleton, text-keyword bipartite graph, and dual-channel retrieval
3. The performance advantages and cost reduction effects of KET-RAG on real datasets
Research pain point : Existing graph-based retrieval-augmented generation ( Graph-RAG ) systems face a dilemma when processing large-scale documents.
On the one hand, although the KNN graph method based on text block similarity is low-cost, it cannot capture the entity relationships within the text, resulting in poor retrieval and generation quality ;
On the other hand, although the knowledge graph-based (KG-RAG) method can improve the retrieval quality by extracting entities and relationships, its high indexing cost makes it difficult to apply on a large scale. For example, the indexing cost of processing 5GB of legal documents may be as high as $33,000 .
Knowledge graph skeleton : Build the knowledge graph only from core text blocks, greatly reducing indexing costs. Text-keyword bipartite graph : As a lightweight alternative to the knowledge graph, it achieves efficient retrieval by associating keywords with text blocks. Dual-channel retrieval strategy : Combining the advantages of the knowledge graph skeleton and the text-keyword bipartite graph to balance retrieval quality and cost.
The core of the KET-RAG framework is to combine the multi-granularity index structure, which includes the following parts:
Skeleton-RAG : Select important text blocks from the KNN graph through the PageRank algorithm, and build a knowledge graph only for these core text blocks to reduce indexing costs.
Text-Keyword Bipartite Graph (Keyword-RAG) : Split all text blocks into sub-blocks and construct a graph of keywords and sub-blocks. Keywords and their neighboring text blocks are used as candidate entities and relations for lightweight retrieval.
Dual-channel retrieval : In the retrieval stage, KET-RAG combines the advantages of the knowledge graph skeleton and the text-keyword bipartite graph, and balances the contributions of the two by adjusting the retrieval ratio parameter (??) to improve the retrieval quality.
Parameter optimization : By adjusting parameters such as the input text block size (ℓ) and the number of segmentation levels (??), the retrieval and generation performance are further optimized.
Through this multi-granularity indexing and dual-channel retrieval strategy, KET-RAG significantly reduces the indexing cost while ensuring the retrieval quality, providing an efficient and low-cost solution for large-scale knowledge retrieval and generation tasks.
https://arxiv.org/pdf/2502.09304KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG