Disassembling the capabilities and composition of the intelligent system, we need reliable AI systems, not agents

In-depth analysis of the intelligent agent system architecture, revealing the evolution of RAG and GraphRAG and their key role in intelligent agents.
Core content:
1. The core concept of Retrieval Augmented Generation (RAG) and its application in knowledge base search systems
2. The introduction of GraphRAG and its advantages in retrieval based on data point relationships
3. Analysis of the challenges and limitations faced by GraphRAG
A comprehensive discussion is given on Retrieval-Augmented Generation (RAG), its evolution to GraphRAG, the key role of memory in intelligent agents, and architectural patterns for building complex agent systems.
1. Retrieval Enhanced Generation (RAG) and GraphRAG
- Core concepts of RAG: RAG addresses the limitation of LLMs that they are not omniscient by providing them with relevant context from a specific dataset before generating a response.
- "Retrieval-augmented generation (RAG) is an effective way to have an AI extract information from a specific dataset that you want it to use. The idea is relatively simple - while generative LLMs are great in their domain, they are not omniscient. So if we want an LLM to generate a response based on specific information in a document, we must first provide it with that information (context)."
- Popularity of Naive RAG : Basic RAG, often referred to as “Naive RAG”, relies primarily on semantic (vector) search and has become the standard for knowledge base search systems.
- "RAG solves this problem and has become almost universally the solution for the vast majority of knowledge base search systems we see today."
-Vector search as the basis for naive RAG: The “R” (retrieval) in most RAG systems is based on vector search, which utilizes embedding models to encode queries and data to achieve similarity-based retrieval.
- "There are many ways to retrieve context, but by far the most common approach is to perform semantic search (vector search) on a given dataset. This leads to the term 'naive RAG', which is just a basic question answering system that uses vector-based search for retrieval."
-Limitations of Naive RAG: Naive RAG treats data items as independent and lacks explicit representation of the relationships between them (beyond semantic proximity).
- "It is important to note that each entry is independent. Each entry has a meaning that can be represented by a vector (embedding). Therefore, Naive RAG only has access to the independent vector information of each entry. This data representation cannot represent any relationship between data points beyond their meaningful proximity in the semantic space."
- Introduction of GraphRAG: GraphRAG utilizes knowledge graphs in the retrieval process to achieve contextual retrieval based on the relationships between data points.
- "For example, Graph RAG allows to retrieve context based on the relationships between data points in the database. With a hybrid RAG system that combines vector-search-based RAG and Graph RAG, we can return results not only based on their contextual meaning, but also based on the relationships within the data."
- Advantages of GraphRAG: GraphRAG provides a more cohesive and organized representation of entities, relationships, and communities, allowing retrieval based on contextual meaning and data relationships. It supports hybrid retrieval methods that combine graph search and vector search.
- Hybrid Retrieval in GraphRAG: GraphRAG facilitates hybrid retrieval by first identifying related entities using vector search and then discovering connection information using graph traversal.
- Limitations of GraphRAG: These include the cost and effort of initial knowledge graph construction, potential scalability issues with highly connected nodes, and the need for regular full re-indexing to incorporate new data, which makes updates less efficient than traditional RAG.
- "Compared to traditional RAG block-based approaches, GraphRAG provides more entity-centric indexing and retrieval, providing richer descriptions of entities and communities. However, it faces the challenge of static LLM-generated summaries, which require regular full re-indexing to capture updates as new data comes in. This indexing pipeline can incur significant token costs."
- When to use GraphRAG: Particularly valuable when data has important relationships and interdependencies, such as contracts, research papers, or organizational records.
- It is particularly powerful when the data is full of connections and interdependencies, such as contracts, research papers, or organizational records.
2. Memory in Agent
True intelligence and autonomy for AI agents requires a memory that persists across interactions, enabling them to learn, evolve, and maintain context. This goes beyond the limitations of context windows and stateless RAGs.
-Current AI lacks true memory: While many systems create the “illusion of memory” through context windows, most AI systems are currently stateless and lack the ability to remember past interactions or adjust over time.
"Unfortunately, this is exactly how most AI systems behave today. They are smart, yes, but they lack one crucial thing: memory."
- "This illusion of memory created by context windows and clever cue engineering leads many to believe that agents already 'remember'. In reality, most agents today are stateless and cannot learn from past interactions or adjust over time."
-The importance of persistent internal state: True memory involves building a persistent internal state that evolves and affects all future interactions of the agent.
- “Memory is not just about storing chat logs or stuffing more tokens in the prompt. It’s about building a persistent internal state that evolves and affects every interaction the agent has, even weeks or months apart.”
-The three pillars of memory: state (knowing what is happening in the present), persistence (retaining knowledge across sessions), and selection (deciding what is worth remembering) define memory in an agent, thus enabling continuity.
- "The three pillars that define memory in an agent are: state: knowing what is happening right now, persistence: retaining knowledge across sessions, and selection: deciding what is worth remembering. Together they enable something we never had before: continuity."
- Limitations of context windows: While useful, large context windows are temporary, flat (no priority), expensive (cost and latency), recall based on proximity, reactive, and do not provide personalization.
- "But this approach falls short due to certain limitations. One of the main drawbacks of calling LLMs with more context is that they can be expensive: more tokens = higher cost and latency."
- Tables are provided that clearly contrast the differences between context windows and memory on multiple functions (retention, scope, extension, latency, recall, behavior, personalization).
- Difference between RAG and memory: RAG retrieves external knowledge in the current interaction to provide better answers and is stateless. Memory provides continuity by capturing user preferences, past interactions, and outcomes to shape future behavior.
- "While both RAG (retrieval-augmentation-generation) and memory systems retrieve information to support LLM, they solve very different problems."
- "RAG helps agents answer questions better. Memory helps agents act smarter."
-Main differences at the system level: Unlike stateless RAG, memory tracks time, maintains state across sessions, learns user models and adapts based on past experience.
- The table provided highlights these differences in terms of time awareness, statefulness, user modeling, and adaptability.
- Types of memory: Agents can have short-term (working memory for conversational coherence) and long-term memory, which can be further divided into factual memory (preferences, style), episodic memory (specific past interactions), and semantic memory (generalized knowledge).
- "At a basic level, memory in AI agents comes in two forms: Short-term memory: maintaining the immediate context within an interaction. Long-term memory: retaining knowledge across multiple interactions and sessions."
- Tables detailing working memory, factual memory, episodic memory, and semantic memory provide clear examples of each type.
-The importance of intelligent filtering and dynamic forgetting: An effective memory system requires prioritizing and selectively retaining important information while forgetting irrelevant information over time to avoid information overload and maintain focus.
- "Intelligent filtering: Not all information is worth remembering... Dynamic forgetting: Good memory systems require efficient forgetting."
-Memory consolidation and cross-session continuity: High-level memory systems transfer information between short-term and long-term storage and maintain context across sessions and time periods.
- “Memory consolidation: We move information between short-term and long-term memory stores based on usage patterns, recency, and importance… Continuity across sessions: Most agents reset at the end of a session.”
-Memory as a differentiator: In the future, memory will become a key differentiator for AI agents, transforming them from disposable tools to persistent teammates.
- “In a world where all agents have access to the same models and tools, memory will become a differentiator. It’s not just the agent that responds — the one that remembers, learns, and grows with you that will win.”
3. Agent Architecture
Agent architectures go beyond simple LLM interactions by integrating memory, tools, and reasoning to create autonomous systems capable of performing complex tasks. These architectures can be single-agent or multi-agent, with various design patterns for collaboration.
- Definition of Agent Architecture: It consists of one or more Agents with memory and tool access, capable of making autonomous decisions.
- “The Agent architecture consists of one or more Agents with memory and tool access.”
- AI Agent components: typically include an LLM (for reasoning), memory (short-term and long-term), and access to tools (e.g. vector search, web search, APIs). Reasoning includes planning and reflection.
- The diagram showing the components of an AI Agent clearly shows the LLM at the center, interacting with memory and tools.
- Vector database in Agent architecture: used as a tool (external knowledge source) and memory (storing past interactions for semantic retrieval) for the RAG pipeline.
- "Vector databases can be used for different purposes in agent architectures...Vector databases are most often used as a tool for agents as part of a retrieval-augmented generation (RAG) pipeline...Vector databases can also be used for memory in agent architectures."
-Single -agent vs. multi-agent architecture: Single-agent systems use one LLM and tools to solve a task. Multi-agent systems involve multiple specialized agents collaborating. The choice depends on the task complexity and requirements.
- "When building an agent system, you can build a single-agent or multi-agent architecture... Typically, we may still have one agent (LLM) who is the head of the entire operation: the master agent."
-Patterns in multi-agent systems: There are a variety of design patterns for building multi-agent collaborations, including cyclic (iterative improvement), parallel (working simultaneously), sequential (the output of one agent serves as the input of the next), router (central agent directs), aggregator/synthesizer (collecting and synthesizing outputs), hierarchical/vertical (supervisor-subordinate), and network/horizontal (peer-to-peer communication).
- Naive RAG as non-agent: Naive RAG is described as a single-shot process with limitations in query processing, relevance verification, and single-shot retrieval.
- "A naive (non-agent) RAG architecture typically consists of an embedding model, a vector database, and a generative LLM. This non-agent naive approach is a one-shot solution that directly uses the user query to retrieve additional information, and then uses the retrieved information directly in the prompt."
- Single-Agent RAG Architecture: Improves on the naive RAG by introducing an agent that is able to reason about the query, decompose it, route it to the appropriate knowledge sources, transform the query to get better retrieval results, and evaluate the retrieved information before generating an answer. Memory can also be integrated.
- Diagram illustrating the single-agent RAG workflow showing the steps of checking memory, decomposing the query, routing, transforming, and evaluating the retrieval.
-Multi -agent RAG architecture: multiple specialized agents are linked in series to handle complex tasks. Examples include hierarchical (supervisor coordinates query agents), sequential (output of one agent serves as input to the next), human-machine collaboration (requires human input), and systems with shared tools.
-Memory transformation through tool usage: Agents can be designed to transform the data in memory using specific tools (e.g., summarizing past interactions or enriching user data).
- "Since past interactions can be stored in a vector database that acts as a memory, the data transformation agent can also be used for memory. This can be useful if you want to, for example, summarize past interactions."
-Flexibility and Customization: The Agent architecture provides endless design possibilities based on the needs of specific use cases.
- In conclusion, there are many ways to build an Agent architecture... the possibilities are endless.
This discussion highlights the evolution from basic RAGs to more sophisticated approaches that exploit data relationships, such as GraphRAG. The key role of memory in building truly intelligent and adaptive AI agents is emphasized, and basic concepts and architectural patterns for designing single-agent and multi-agent systems for retrieval-intensive applications are introduced. The integration of vector databases as knowledge sources and memory stores is a recurring theme throughout these concepts.