HippoRAG 2 released, GraphRAG abdicates~

Written by
Silas Grey
Updated on:July-14th-2025
Recommendation

HippoRAG 2 leads to a new breakthrough in RAG systems, taking simulation of human long-term memory a step further.

Core content:
1. The innovation of the HippoRAG 2 framework and its improvement over existing RAG systems
2. Evaluation and performance of HippoRAG 2 in key dimensions
3. Comparison of HippoRAG 2 with baseline methods and details of performance improvement

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
To address the limitations of existing retrieval-augmented generation (RAG) systems in simulating the dynamics and associativity of human long-term memory , a new framework HippoRAG 2 is proposed and will be open sourced.
Continuous learning capabilities are evaluated on three key dimensions: fact memory, perceptual construction, and relevance. HippoRAG 2 outperforms other methods ( RAPTOR, GraphRAG, LightRAG, HippoRAG ) in all benchmark categories, bringing it closer to true long-term memory systems.
The core idea of ​​the HippoRAG 2 framework : HippoRAG 2 is based on HippoRAG's personalized PageRank algorithm, which pushes the RAG system closer to the effect of human long-term memory through deep paragraph integration and more effective use of online LLM.

Offline indexing:

  • We use LLM to extract triples from paragraphs and integrate them into an open knowledge graph (KG).
  • Detect synonyms through embedding model and add synonym edges in KG.
  • Combine the original paragraph with the KG to form an open KG containing concepts and contextual information.
Online search:
  • Use embedding models to link queries with triples and paragraphs in the KG and determine seed nodes for graph search.
  • The retrieved triples are filtered through LLM to keep the relevant triples.
  • A personalized PageRank algorithm is applied for context-aware retrieval, ultimately providing the most relevant paragraphs for downstream question-answering tasks.
      Baseline methods: including classic retrievers (BM25, Contriever, GTR), large embedding models (GTE-Qwen2-7B-Instruct, GritLM-7B, NV-Embed-v2), and structure-enhanced RAG methods ( RAPTOR, GraphRAG, LightRAG, HippoRAG ).
      Evaluation metrics: The question answering task uses the F1 score, and the retrieval task uses passage recall@5.
      Performance improvement: HippoRAG 2 outperforms other methods on all benchmark categories, with an average F1 score of 7 percentage points higher than standard RAG, especially on the associative memory task.
      A HippoRAG 2 pipeline example
      https://github.com/OSU-NLP-Group/HippoRAG From RAG to Memory: Non-Parametric Continual Learning for Large Language Modelshttps://arxiv.org/pdf/2502.14802