Rejecting fragmented RAG, Google DeepMind launches ReadAgent: Simulating human reading of long texts, or is it the underlying technology of NotebookLM?

Written by
Audrey Miles
Updated on:June-13th-2025
Recommendation

Google DeepMind launched ReadAgent, which simulates human reading of long texts, providing new ideas for LLM to process long documents, or the underlying technology of NotebookLM.

Core content:
1. The complexity and limitations of the current RAG framework
2. The innovative design of ReadAgent to simulate human reading behavior
3. The technical implementation of system structure and segmentation strategy

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

In recent years, the RAG framework has evolved from its initial simple design to various advanced variants, such as GraphRAG and HippoRAG, with increasingly complex structures. As an actual user, I have increasingly felt the heavy burden of this complexity in the application process. At first, RAG seemed to be just a retrieval enhancement method introduced to solve the problem of insufficient context windows; but now, it has gradually become a system engineering that requires careful construction of indexes, segmentation of fragments, and design of routing and rearrangement mechanisms. This makes me reflect: Is the technology tree wrong? Why doesn't it feel like a miracle is coming out of nowhere?

after all,When humans learn knowledge, they don’t cut information into “pieces” to remember or understand. We tend to read page by page and summarize chapter by chapter to build a systematic cognitive structure. In this process, we extract key concepts, update old knowledge, and form cross-chapter and cross-topic understanding links in long-term memory.

However, current large language models (LLMs) are still limited by context windows and computing resources when processing very long documents (such as novels, papers, meeting minutes, etc.), making it difficult to simulate such a continuous, systematic, and traceable reading process. A very interesting work recently proposed by the Google DeepMind team, "A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts" [1] , attempts to build a reading agent with "gist memory" and "recall capability" inspired by human reading behavior, which provides a new direction for how LLM can efficiently process long texts.

It fits my idea perfectly. I wonder if the very amazing Google NotebookLM is built based on this form, because I can't see any trace of it using RAG.

Why we need human-inspired reading agents

When processing texts with tens of thousands of tokens, the current mainstream large models still fail to deliver satisfactory results, even if the model architecture has long-context capabilities. Common processing methods include cropping, window sliding, summary generation, and RAG based on slicing, (extraction) embedding, and retrieval, but these methods usually face three problems:

  1. Fixed segmentation may interrupt semantic units (such as an argument or storyline), resulting in missing information;
  2. Even if the full text is fed into the model, the sparse distribution of attention will cause the model to “lose attention” to key details;
  3. The embeddings generated by slicing may bring semantic matching problems.

The core idea of ​​ReadAgent is to simulate the behavior of humans when reading long texts: first read through part of the content to form a gist memory, and then go back to check the original text according to the questions. This strategy avoids the double bottleneck of efficiency and quality caused by directly processing the entire long text by building a "summary + review" reading process.

ReadAgent System Architecture Overview

ReadAgent is not an independent model, but a modular reading agent built around an existing large model. The whole process consists of three core components:

1. Episode Pagination

Different from fixed window or manual segmentation, ReadAgent adopts an interactive segmentation method: every time the model processes a certain content, it will autonomously determine whether to "stop reading" and form an episode. This segmentation strategy is closer to the human habit of understanding by semantic blocks when reading, which helps to improve the quality of memory compression.

  • Read a portion of a document (such as several paragraphs or pages) continuously;

  • Determine whether it is time to stop and summarize ;

  • Generate a "gist memory" of the current reading content ;

  • Store the memory and proceed to the next reading (next episode) .

This is just like when humans read a book, they stop after reading a chapter or a few paragraphs and think: "What did I just read? What is the core idea?", and then continue reading the next part. The segmentation strategy uses the following prompt:

2. Memory Gisting

Each divided episode will be compressed into a gist memory by the model, which is a key summary of the content of that part. These gist memories are different from traditional summaries. They are shorter and more structured, emphasizing "remembering the main idea" rather than retelling the entire text. This process is completed by the LLM itself, using specially designed prompts to guide it to extract the core information of each paragraph.

3. Interactive Look-up

When answering user questions or performing downstream tasks, if gist memory cannot provide sufficient support, ReadAgent will go back to the original text for "intensive reading" based on memory clues to extract specific details. This review behavior is not defined in advance, but is dynamically triggered based on questions and memory, thereby maintaining processing efficiency while ensuring the completeness of details. The paper studies two methods: parallel review and serial review.ReadAgent-PUse the following prompt to search all pages in parallel, emphasizing efficiency. If the large model is interested in a certain part of the key points, it will return to the corresponding page, and then the program will replace the previous key points with the corresponding page content to try to answer the question.

ReadAgent-S, on the other hand, uses serial query, in which the model requests one page of the document at a time, up to a certain number of pages. In sequential query, the model can look at the content of previously expanded pages before deciding which page to expand. Compared to parallel query, sequential query allows the model to obtain more information, so it may lead to better performance in some tasks . However, sequential query significantly increases the number of interactions with the model, thereby increasing computational overhead. Therefore, this strategy should only be used for tasks that can significantly benefit from it .

Experimental Verification and Performance

The authors evaluated the performance of ReadAgent in multiple real-world scenarios, covering typical long text tasks such as question answering (QuALITY, NarrativeQA) and summarization (QMSum). In comparison with the following baseline models:

  • Directly enter the full text
  • Search Enhanced LLM
  • Only use gist memory without reference mechanism

ReadAgent shows higher accuracy, recall or summary quality. Especially in NarrativeQA and QMSum, it can effectively improve the model's ability to understand complex structures such as event sequence and causal relationships. The authors also pointed out that ReadAgent can extend the "effective context processing length" of LLM to more than 3 to 20 times the original.

Theoretical basis and inspiration

The design of ReadAgent is not a castle in the air. Its theoretical basis comes from the psychological research on the human memory system. According to classical theory, humans mainly use two memory methods when processing information:

  • Verbatim memory : memorizing information verbatim, emphasizing details;
  • Gist memory : extract the main points, retain the key content, and ignore the redundancy.

ReadAgent is based on gist memory, guiding the model to build gist memory through structured prompts and reactivating verbatim memory when the task requires it. The introduction of this dual-track memory mechanism in LLM provides a new paradigm for simulating the human reading and comprehension process.

Practical value and future direction

From the perspective of engineering and application, ReadAgent has several obvious advantages:

  • No need to modify the model structure : it can be implemented entirely based on prompt engineering, making it easy to integrate into existing systems.
  • Modules can be combined : You can flexibly choose whether to use the segmentation, summary or review modules according to task requirements.
  • Strong cross-task adaptability : In addition to standard QA and summarization, it can also be used for a variety of context-driven tasks such as web navigation and search guidance.

Of course, this method still has some problems that need to be optimized:

  • The quality of summaries is unstable, and improper compression may miss key information;
  • Currently, segmentation relies on rules or weak supervision. In the future, mechanisms such as reinforcement learning can be introduced to optimize segmentation strategies.
  • In scenarios with extremely long texts (such as archives with millions of words), the cost of access and real-time performance still need to be further balanced.

I think the biggest disadvantage of this type of method is the lack of effective storage and retrieval methods. Every time, you need to refer to the key points of the entire text to retrieve relevant content. I wonder if further embedding the key points memory can effectively reduce the retrieval time at the expense of a small amount of accuracy? However, it may be very suitable for some scenarios that do not require time but only accuracy.

Conclusion

ReadAgent demonstrates a long text understanding method based on human cognitive mechanisms. It successfully migrates the classic learning strategy of "remembering the main idea + consulting on demand" to the LLM system, achieving performance breakthroughs in multiple tasks. As the demand for long text processing continues to rise, this type of structured, modular, and cognitively inspired method may become an important development direction for large models in the future. Even if combined with the popular enhanced retrieval, this new type of RAG may be further enhanced. If you are exploring how to make language models more intelligently handle complex information sources such as papers, novels, or reports, ReadAgent provides a solution worth learning from.

The code in this article has been open sourced, but it is just a Demo:

https://github.com/read-agent/read-agent.github.io/blob/main/assets/read_agent_demo.ipynb

There is also a HuggingFace Demo available for testing:

https://huggingface.co/spaces/ReadAgent/read-agent

Finally, there is actually an open source project calledPageIndexIt seems to be simulating the work of Google's DeepMind team, but only the summary construction part is done halfway, without the retrieval part.

Open source address:https://github.com/VectifyAI/PageIndex