90% of AI conversations are stupid, the core reason is memory problems

Written by
Silas Grey
Updated on:June-28th-2025
Recommendation

Explore the core obstacles of AI conversational intelligence and uncover the secrets of the memory module.

Core content:
1. Why AI conversations seem "stupid": the problem of mixing models and domain knowledge
2. Two major requirements for AI clones: consistency of thinking and consistency of style
3. The importance of memory modules: parameter memory and contextual unstructured memory

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

In fact, whether it is a conventional AI application or the Agent framework that everyone is talking about now:

There has always been a problem that is difficult to solve: how to mix models with domain knowledge (personal knowledge) .

Because most companies use models roughly and directly provide prompt words , such as the part where we generated opinions for the article "Why AI multi-round conversations are so stupid".

This kind of generated prompt words, in essence, uses the knowledge of the model itself, so it cannot be called a qualified avatar . For example, my AI avatar has a speech:

Your hyena philosophy is quite good! But the Huawei plan was successful because the top management broke the monopoly of the veterans. Forcing the directors to decentralize power but allowing the board of directors to form cliques is no different from asking a cripple to run a marathon.

Logically, I would never make such a statement. The core reason is: I am not familiar with Huawei, and all my cases come from my daily work. This is why it feels so clear to everyone when they read it .

The point here is that every time the AI ​​speaks, it must meet expectations. It must have my knowledge and habits . It has two requirements:

  1. When generating opinions, they are consistent with my thinking;
  2. When expressing opinions, they should be consistent with my style;

All of this actually requires only one thing: the model has memory function ...

unsetunsetLLM Memoryunsetunset

Memory has always been the focus of research in the Agent era, and it is also a stumbling block that is difficult to overcome in current AI applications.

In fact, many companies that lack traffic are happy to see this , because applications in the AI ​​era have become less of a technical secret , and data assets may be their last barrier.

On the other hand, every model release in the past few days may subvert some startups. For example, the release of GPT has brought many teams working on Wenshengtu to a standstill.

But memory is a little different: the hallucination problem is logically difficult for the model to solve, so it is definitely not wrong to do RAG based on the knowledge base .

To gain a deeper understanding of memory issues, we can start from two perspectives:

  1. First, storage : how should the data (memory) of a large model be stored?
  2. Second, application : how to enhance the model's contextual understanding ability through data (memory). In addition to the most basic memory, it also involves issues such as updating, forgetting, and the comprehensiveness of memory .

Then another paper made a basic classification of memory, which I think is pretty good and can be used directly:

https://arxiv.org/pdf/2505.00675

1. Parameter memory

The so-called parametric memory is the built-in memory of the model, which is what we often call the model's own knowledge base. It is formed through pre-training and fine-tuning including RL.

This built-in knowledge is an immediate, long-term, persistent memory that enables rapid, context-free retrieval of facts and common-sense knowledge.

In other words: the fine-tuned model generalizes better than the cue word .

However, the problem is also very clear: first, there is a lag in knowledge time, and more importantly, various domain knowledge is missing . If only parameter memory is used, then the model is similar to an employee on probation .

2. Contextual Unstructured Memory

Contextual Unstructured Memory can be understood as multimodal information, including text, images, audio and video.

They can enable the model to have the ability to read, see and hear, and are born to solve the problem of agent perception ability.

3. Contextual Structured Memory

Contextual Structured Memory is our most common knowledge structure.

Such as knowledge graphs, relational tables, or ontologies , while remaining easy to query. These structures support symbolic reasoning and precise querying, often complementing the association capabilities of pre-trained language models.

PS: Although it can be used directly, the nutrition of AI papers is really low now ...

unsetunsetAbout the processing of knowledgeunsetunset

operate
One sentence definition
Typical scenarios
Representative Work/Products
Consolidation
Extract and summarize short-term memory and transfer it to long-term storage to ensure subsequent sustainable access
Chatbots write down the highlights of a day’s conversation into user profiles
MemoryBank
:Automatically summarize key information and write it into a persistent storage based on the conversation end-time window, using the Ebbinghaus curve to decay weights ([arXiv][1])
Indexing
Generate a structured "catalog" or relationship graph for memory to improve retrieval efficiency and explainability
Multi-hop question answering first locates related facts in the knowledge graph
HippoRAG
:Inspired by the hippocampal indexing theory, we transform documents to knowledge graph nodes and then use PageRank to find multi-hop paths ([arXiv][2])
Updating
Add, delete, and modify memory based on new information to keep it up to date and consistent
Automatically replace expired stats when republishing podcast summaries
NLI‑transfer
 Detecting Contradictions with Natural Language Inference (NLI) in Long-Term Multi-Turn Dialogue and Partially Rewriting Memory ([ACL Anthology][3])
Forgetting
Selectively remove or erase content that is no longer needed or is subject to the “right to be forgotten”
The company received a GDPR request asking the model to forget a customer's data
"When Machine Unlearning Meets RAG": Delete and refit vector library entries and update model parameters simultaneously using logarithmic difference regularization ([arXiv][4])
Retrieval
Find the most relevant entry in the memory according to the current query for generation or reasoning
Travel assistant recalls the type of restaurant the user liked six months ago
LoCoMo
 Benchmark: Convert the conversation into an "event graph" to support temporal and causal constraint retrieval, and evaluate whether LLM can still answer accurately after 35 conversations ([arXiv][5])
Compression
Summarize or vectorize the context/retrieval results before and after inference to save context windows
Compress 20 pages of financial report into 1 page of key information and feed it to the model
xR
: Treat the retrieved document embedding directly as a single "modal token" input model, achieving 3.5× FLOPs reduction and performance ↑10% ([arXiv][6])

I won’t dissect the paper one by one, I will just interpret it according to my understanding. The so-called memory operation is to convert the volatile short-term context to the persistent long-term memory. Its core difficulty is:

  1. Which content to choose?
  2. What format to save in?
  3. How to make LLM really “rememberable” in the future?

For example, I wrote a 40-lesson management course . Now I want to create an AI clone. How should I consolidate the knowledge here? How can I use the least amount of work to make LLM both find and "remember" my content?

1. What content is stored?

The whole process is divided into three layers: external RAG layer → structured layer → light fine-tuning layer , in order of value. The first is content selection, which can be done with:

Priority
Specific selection rules
Target Token Proportion
Example (content from management class)
★★★ Key concepts & framework
Title, chapter name, five-step model, formula, chart title
20 %
OKR cycle, 5W2H, Drucker's five tasks
★★ High-frequency QA & golden sentences
The most frequently asked/liked Q&As or one-sentence quotes
30 %
“Goals come before means, not KPIs.”
★ Writing details & scenario examples
Long case, story details
50 %
Huawei department wall case, Netflix culture story

Every time a lesson is uploaded, the "extraction + summary" script is run to write the three levels of content in the above table into a layered database, and a small amount of manual proofreading is done to ensure the accuracy of key concepts .

2. What format to save

This is actually quite simple. Just use the structured knowledge base + RAG together . The processed data will look like this:

{
  "id""L17-okr-loop" ,
"type""concept" ,
"title""OKR Cycle" ,
"summary""Set goals → Key results → Align → Check → Review" ,
"keywords" : [ "goal management" , "OKR" , "cycle" ],
"lesson" : 17,
"timestamp""2025-05-10T12:00:00Z" ,
"importance" : 0.9
}

In fact, the structured knowledge base will directly introduce the knowledge graph here .

3. Make LLM memorable

The so-called recall rate is high. There are many strategies here. For example, we first filter the relevant information in 40 lessons based on the explicit words in the question (narrowing the vector search range and reducing the delay by 40-60%).

This means that you can use the model to optimize a wave of questions, extract keywords, and then search.

Secondly, select knowledge (≤ 500 items) with a “high frequency hit rate > 30%” and “the answer can be explained in one step”.

That is to say, use some strategies to discard most of the unnecessary returns .

This is actually the general RAG operation, which is actually quite simple to explain, so you can appreciate it yourself...

Finally, let’s talk about the issue of knowledge graphs.

unsetunsetKnowledge Graphunsetunset

Regarding how knowledge graphs can enhance large models, there have been previous articles introducing this: Knowledge Graphs

Today, we will continue with the previous case study: building my own AI avatar . The most critical challenge here is how to make the model truly "inherit" my knowledge system and thinking mode.

Here we will show you how to transform 40 management courses into a knowledge graph and achieve deep collaboration with a big model to create an AI avatar that truly "understands you".

1. Knowledge Extraction

Converting 40 management courses into a knowledge graph is not a simple text conversion, but requires the establishment of a three-level knowledge representation system of concept layer, relationship layer and case layer:

  1. Conceptual layer: extract the core management theories, methodologies and tool frameworks in the course
    1. Node examples: OKR cycle, 5W2H analysis method, Drucker's five tasks
    2. Attributes include: definition, proposer, applicable scenarios, advantages and disadvantages
  2. Relationship layer: Establish multi-dimensional associations between concepts
    1. "OKR cycle" → "derived from" → "MBO theory"
    2. "5W2H analysis method" → "can be used for" → "problem diagnosis scenario"
  3. Case layer: connecting abstract theory with concrete practice
    1. "Huawei Department Wall Case" → "Verification" → "Barriers to Cross-Departmental Collaboration"
    2. "Netflix Culture Change" → "Embody" → "Situational Leadership Theory"

In fact, knowledge organization directly determines the quality of subsequent model answers, so it is worth spending a lot of effort here!

2. Graph Construction

There are many frameworks for graph construction. We will briefly describe them here. Taking the "target management" module as an example, its knowledge graph fragments may include:

{
  "nodes" : [
    {
      "id""MBO" ,
      "type""concept" ,
      "label""Management by Objectives (MBO)" ,
      "properties" : {
        "definition""a goal-oriented management approach proposed by Peter Drucker in 1954" ,
        "core_principles" : [ "goal setting" , "self-control" , "results orientation" ],
        "lesson_reference" : [ "L03" , "L17" ]
      }
    },
    {
      "id""OKR" ,
      "type""concept" ,
      "label""OKR goal management method" ,
      "properties" : {
        "derived_from" : [ "MBO" , "SMART principle" ],
        "implementation_steps" : [ "Goal setting" , "Key result definition" , "Regular review" ],
        "case_studies" : [ "Google 2018 OKR Implementation" , "ByteDance Bimonthly OKR" ]
      }
    }
  ],
"edges" : [
    {
      "source""OKR" ,
      "target""MBO" ,
      "type""derived_from" ,
      "weight" : 0.9
    },
    {
      "source""OKR" ,
      "target""SMART" ,
      "type""enhanced_by" ,
      "weight" : 0.7
    }
  ]
}

This structured representation makes knowledge traceable (each conclusion has a course source) and composable (different concepts can be freely associated).

3. Search

Next comes the key retrieval enhancement phase. When the AI ​​avatar needs to answer user questions, it uses a three-stage process of graph retrieval → vector screening → context construction :

1. Graph pattern matching: Convert natural language questions into graph queries , such as:

Question: "How do OKR and KPI work together?" → Match the "OKR" and "KPI" nodes and the paths between them

2. Subgraph extraction and vector screening: In fact, it is to extract all the knowledge with relatively close concepts:

(OKR Basic Principles) 
│─┬─ Includes: [Challenging goal setting] (weight 0.9) 
│ ├── Conflict: [Feasibility Assessment] (need to be balanced) 
│ └── Application: [Google’s 2014 OKR Reform] 

(Goal Setting Theory) 
│─┬─ Source: [Drucker's MBO theory] 
│ └── Tools: [SMART principle] 

3. Context Construction: Converting Search Results into Natural Language Prompts

This is relatively simple, for example: "According to courses L17 and L23: 1) OKR focuses on goal orientation, KPI focuses on indicator measurement... 2) Huawei's practice shows that..."

4. Strengthening the thinking chain

When generating responses, the AI ​​avatar simulates the thought process of a professional consultant:

  1. Concept positioning: This question involves the comparison between OKR and KPI in the field of goal management
  2. Knowledge extraction: Three relevant cases and two theoretical frameworks in the recall course
  3. Integration of viewpoints: My typical analysis angle is to first distinguish the applicable scenarios and then discuss the integration methods
  4. Style adaptation: Use the three-part expression of problem-root cause-solution
You are a senior management consultant, please answer according to the following structure:
1. The essence of the problem: use one sentence to point out the core contradiction
2. Theoretical basis: cite 2-3 key concepts from the course
3. Practical case: briefly describe a relevant business case
4. Personal opinion: Use "I think" to express a clear position

Current question: {User question}
Related knowledge: {retrieved subgraph information}

V. Case

Finally, let me give you another example. For example, a fan asked a question today: How can parachuted executives quickly establish team prestige?

The processing flow of AI clones should be:

First, knowledge retrieval

  1. Matched to the "New official takes three actions upon taking office" case (L12)
  2. Relating "Sources of Power Theory" (L08) and "Situational Leadership Model" (L15)
  3. Recalling the "Three-month Survival Rule" I mentioned in internal training

Second, opinion generation

1. The essence of the problem: This is about the balance between leadership legitimacy and change management

2. Theoretical basis:
   - According to course L08, sources of power include position power and personal power
   - Situational leadership theory emphasizes that different leadership styles are needed at different stages

3. Typical cases:
   - Case study of Zhang Yong’s arrival in Alibaba mentioned in Course L12: Only three things were done in the first 30 days...

4. My opinion:
   I believe that parachuted leaders should avoid the "proven themselves" trap and should:
   - 70% of the time is spent listening and diagnosing
   - Solve 1-2 obvious pain points first to build trust
   - Build change capital through quick wins

Third, style adjustment

  1. Add a catchphrase: "Remember, management is a craft"
  2. Use common parallel sentences: "First, you must... Second, you must... Third, you must..."
  3. Maintain the iconic 70% theory + 30% case narrative ratio

Through this technical solution of deep integration of knowledge graphs and large models, AI clones will no longer be simple repeaters , but truly become digital twins with a consistent worldview and professional judgment...

unsetunsetConclusionunsetunset

The above actually revolves around model memory. I believe everyone has understood it by now. The so-called model memory is the application of the knowledge base: external RAG casts short-term perception, knowledge graph constructs long-term index, and lightweight fine-tuning implants personality.