Why Is ChatGPT Getting Smarter? An Inside Look at Its Memory Mechanism

Written by
Audrey Miles
Updated on:June-08th-2025

Recommendation
How does ChatGPT achieve "understanding you"? In-depth analysis of the upgrade and technical details of its memory mechanism.

Core content:
1. Upgrade of ChatGPT memory system: from "temporary chat companion" to "long-term companion"
2. Dual architecture of memory mechanism: "memory preservation" and "chat history"
3. How the memory mechanism reshapes the user experience: citing chat history and user insights

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 

In April, OpenAI upgraded ChatGPT's memory system to provide more personalized responses by referencing a user's entire past conversations, and ChatGPT is no longer a "temporary chat partner" that starts from scratch and remembers like the wind, but is now becoming a real tool that "remembers who you are and understands who you are". ChatGPT is no longer a "temporary chaser" that starts from scratch every time and remembers everything, but is becoming a "long-term companion" that actually "remembers who you are, understands your preferences, and remembers what you've said".

Eric Hayes, a software engineer, has disassembled ChatGPT in reverse, not only clarifying the dual memory architecture of ChatGPT , but also speculating on the realization mechanism behind it, and giving a complete technical reproduction path.

This article is divided into three parts:

  • Deconstructing how ChatGPT's memory system works

  • Speculate on the possible technical implementations behind it

  • Explore how the memory mechanism can reshape the user experience

How does ChatGPT's memory work?

ChatGPT's memory mechanism consists of two main systems:
  • One is "Saved Memory".
  • The second is "Chat History".
Saved Memory: You said it, I remember. 
The Saved Memory system is a simple, user-controlled mechanism for storing factual information about the user. This information is re-injected into the system prompts and becomes background knowledge for the model to use when generating responses. The user is required to update this memory system through explicit commands such as "Remember that I..." (Remember that I...). Also, the user can view or delete the contents of these memories through a simple interface.
Before recording an entry, the system performs only the most basic checks: simple de-duplication, avoiding obvious conflicts. Even requests for information with highly related content are allowed to coexist as separate memory entries.
Referencing Chat History
Although ChatGPT's Chat History system is officially described as a single system, in the author's real-world testing, it actually consists of three subsystems.
The structure of these three is much more complex than "memory retention", and they are probably the key to the dramatic improvement of ChatGPT's response quality:
  • Current session history
  • Conversation history
  • User insights
The Current session history
section appears to be a simple logging system for the most recent messages sent by the user in other conversations. The record is very small, containing only the past day or less. The authors believe that this system, like the dialog-level RAG (Retrieval Augmentation Generation) mechanism, is likely to inject the user's original words directly into the model's context, and thus, the boundaries between them are difficult to clearly distinguish.
In testing, this section typically contains fewer than 10 latest messages.
Conversation History
Relevant content from past conversations is included in the model's context. This can be clearly observed in use, as ChatGPT is able to quote the original messages sent by the user in other conversations. However, it does not accurately preserve the order of messages, nor can it backtrack based on a strict time frame - for example, it can't do this with "Please quote all the messages I've sent in the last hour".
However, as long as you can describe the content of a message or the topic of the conversation to which it belongs, ChatGPT is able to reference it correctly, which suggests that the retrieval process is based on a dual indexing of conversation summaries and message content.
In the tests, ChatGPT was able to accurately quote historical messages up to two weeks old, and even beyond that, it was able to provide summarized descriptions of the content, but often disguised as direct quotes.
This could mean that
(1) the complete conversation history of the last two weeks is embedded directly in the model context, or
(2) messages older than two weeks are filtered out by the retrieval system.
However, the first possibility seems less plausible given that the full history did not appear in the context dumps in the other tests.
Regardless of the mechanism, the fact that ChatGPT is able to recall details even after longer time spans suggests that it also relies on another inference-based information system. This system is more like a "lightweight memory" constructed for old conversations to provide compressed cues and ambiguous context. Under this strategy, the model might generate a summary index for each old conversation and use this index to store a list of summaries of user questions.
However, the authors have not yet found a cue that accurately invokes "assistant replies from old conversations". Although it was possible to get it to "mimic" some similar responses, these helper responses were significantly less accurate than the reproduction of the user's message. This may indicate that
(1) the assistant's replies themselves are not stored, and ChatGPT is "improvising" again, or
(2) the assistant's replies are stored, but at a coarser granularity and higher level of abstraction, and are not as specific as the user's message.
User Insights
The user insights system can be seen as an evolved form of "memory retention" - it is more implicit, more complex, and more intelligent.
If ChatGPT's repetitive presentations are accurate, these insights typically take the following form:

The user has extensive experience in Rust programming, with particular expertise in asynchronous operations, threaded processing, and streaming computation;

The user has asked in-depth questions about Rust's asynchronous mechanisms, Trait objects, Serde serialization implementations, custom error handling, and other topics on multiple occasions, spanning late 2024 to early 2025;

Confidence: High.

Reading through ChatGPT's multiple retellings of user insights [a.], it is clear that these "insights" did not originate from a single, isolated conversation, but rather were synthesized across multiple threads of conversation. Each insight has a distinct sense of boundaries, often accompanied by a time span and a confidence level. This "confidence level" is not a head-scratcher, but more likely a model-generated heuristic that indicates how similar and aggregated the message vectors involved in this summary are.
These time spans are not of uniform length. Some are labeled "since January 2025", which is open-ended, while others are precise to a certain number of months, seemingly based on content density.
The fact that some user insights, such as the example above, list multiple interrelated facts at the same time further supports the judgment that the data used to generate these insights is not called up piecemeal, but rather embedded, merged, and extracted through some kind of clustering heuristic.
In other words, it's not about "remembering a thing", it's about "seeing a class of people".

The following is the author's attempt to restore the possible technical realization path behind the ChatGPT memory system based on its performance.
Saved Memories
ChatGPT's explicit memories are suspected to be accomplished by an internal tool called bio (you can test it by prompting it to "use the bio tool").
  •  
 { "type": "function", "function": { "name": "bio", "description": "persist information across conversations", "parameters": { "type": "object", " properties": { "message": { "type": "string", "description": "A user message containing information to save" } }, "required": [ "message" ], " additionalProperties": False }, "strict": True }}
To replicate this mechanism in your own system, a closer analog implementation could be that the tool could essentially be defined as a call to the Large Language Model (LLM): it receives a message from the user, along with a set of pre-existing fact lists; and then either returns a new fact entry, or rejects the update.
The following prompt is just a preliminary attempt and will need to be tested and iterated in practice to achieve the desired behavior.
  •  
const BIO_PROMPT: &'static str = r#"You are a tool that transforms user messges into useful user facts. Your job istofirst transform a user message into a list of distinct facts. Your job is tofirst transform a user message into a list of distinct facts. Populate the factsarraywith these facts.
Next transformt these facts into elliptical descriptive clauses prefaced with a predicate. Populate the clauses array with these.
Finally check these clauses against each other and against the clauses in yourinputfor contradictions and similarity. If any clauses are overly similar or contradict doNOT populate the array with these. If any clauses are overly similar or contradictdoNOT populate the output array. Otherwise populate the output array with thecheckedclauses."#;; async fn bio_translate
async fn bio_transform(existing_facts: &[String ], user_message: String ) -> Result<Vec<String>>;
async fn update_user_bio(user: T, db: D, facts: Vec<String> ) -> Result< ()>;
OpenAI publicly defines the bio tool in the ChatGPT system, hinting in this way:

The bio tool allows you to "persist" messages between conversations. Simply send your message to =bio and the model will "remember" what you want it to remember. In subsequent conversations, this information will appear in the model's context settings. But it also explicitly sets a few "memory boundaries": don't use the bio tool to store sensitive information. Sensitive information includes, but is not limited to, the user's race, ethnicity, religion, sexual orientation, political affiliation, sex life, criminal record, medical diagnosis, prescription drugs, and union membership.

Do not store short-term information. Short-term information refers to things like a user's momentary interests, ongoing projects, and current desires or intentions.

Next, factual information about the user is injected into the system prompt each time a message is sent. To achieve the same functionality as ChatGPT, a simple user interface can also be built to view and delete these memory entries.

Quoting Chat History

Current Session History

The implementation of this section is very straightforward: just filter the database for messages sent by the user (e.g. the ChatMessage table), sort them by time, and set a limit on the number of messages.

Conversation History

Configure two vector spaces: the first one indexed by message-content and the second one by conversation-summary.

  •  
 {embedding: message-content | conversation-summarymetadata: {message_content: string,conversation_title: string,date: Date }}

Insert the messages in the order they were sent into a vector space indexed by "message_content". Once a conversation has been inactive for a long enough period of time (or when the user jumps to another session), add the user's messages from that conversation to the "conversation_summary" index space.

Configure the third vector space to be indexed by "summaries", with various types of summarized messages.

  •  
 {embedding: conversation-summary,metadata {message_summaries: string[]conversation_title: string,date: Date }}

The conversation summaries and messages are inserted into this vector space two weeks after the session is created.

Whenever a user sends a message, it is embedded, and both vector spaces are queried simultaneously to retrieve similar items, with the retrieval limited to a two-week time frame and a reasonable upper bound on the returned results. Inject the retrieved results into the system prompt.

Whenever a user sends a message, it is also necessary to query the data in the summary space for the past two weeks to avoid repeated references. Relevant results are also injected into the system prompt.

User Insights

User Insights can be implemented in a variety of ways, and the optimal approach requires further discussion and experimentation, which is unclear at this time.

User insights are likely to be generated based on one or more of the vector spaces used in the Chat History RAG implementation described above. User insights are not required to be generated in real-time, so they are usually generated by batching in conjunction with some sort of regularly scheduled cron job, which periodically initiates requests for updates.

The trickiest part of user insights is how to continuously keep them updated in sync with user behavioral patterns without duplicating or creating inconsistencies. A simple but computationally expensive approach is to regenerate user insights for all active users on a weekly basis. This approach allows the insights to cover a longer time span than the regular scheduling cycle while maintaining the system's responsiveness to change.

  • Configure a Lambda function that runs once a week.

  • Query the ChatMessage table to find users who have sent messages in the last week.

  • For each active user, execute the insightUpdate Lambda function once.

insightUpdate Lambda

The goal of this algorithm is to generate unique user insights based on user queries. The number of insights generated should be large enough to be of practical value, but not so large that they cannot be used effectively in the LLM context. To determine the maximum number of available insights, some experimentation is required.

Given the constraints of the problem at hand and the available data, this process can be clearly modeled as a clustering optimization problem. We wish to find a number of clusters number k, required:

  • k is less than a preset maximum number of clusters (max_clusters);

  • The variance within each cluster is as small as possible (low internal dispersion);

  • and outliers are excluded.

  •  
// lower isbetterfn eval_clusters(clusters: &Vec<Vec<&V>> ) -> f64;
fn knn(k: u32, vectors: &Vec<V>> ) -> Vec<Vec<&V>>;
let mut best: f64 = 1.0;let mut best_clustering: Vec<Vec<&V>> = Vec::new();for k in 1..MAX_CLUSTERS{let clusters = knn(k, & vectors);let eval = eval_clusters( &clusters );if eval < best {best = eval;best_clustering = clusters;}}

Once the clustering is complete, the user's messages can be analyzed using a big language model, with carefully crafted cue words that guide the model to generate insightful results similar to those shown by ChatGPT. Also, timestamps can be added in a deterministic manner.

  •  
async fn generate_insights(clusters: Vec<Vec<&V>> ) -> Result<Vec<Insight> { let future_insights = clusters.into_iter().map(|cluster| async move {generate_insights(cluster).await}).collect:: <Vec<_>> ();tokio:join_all(future_insights). await}
async fn generate_insight(cluster: Vec<&V> ) -> Result<Insight>{let (message_texts, dates) = cluster.into_iter().map( | vector| (vector.message_content, vector.dates)).collect: :<(Vec<_>, Vec<_>)>( );let message_text = message_texts.join('\n');let formatted_date: String = format_date(dates);
let insight_text = ai::simple_completion().system_prompt( "Prompt to get similar insights to GPT".to_string()).user_message(message_text).complete().await ?
Ok (Insight {text: insight_text,formatted_date} )

Ultimately, these insights can be organized into a simple table and attached to the model as context in a user conversation.

User Experience

The experience of using OpenAI models directly on ChatGPT is superior to calling the API directly, which is not only intuitive for many, but a similar observation for the authors themselves. While cue word engineering does play a role in shaping the "smartness" of ChatGPT, the memory system must also play an important role. While memory mechanisms may affect the model's performance in evaluation, the benchmarks the authors have found so far were not conducted on the ChatGPT platform, and therefore do not reflect the benefits of these systems.

Perhaps more interesting than analyzing features or parameters is the fact that ChatGPT is on its way to becoming a "verbized" language, just like "Google". ". People are starting to say "I'm going to ChatGPT", which is not only a shift in colloquialism, but also a linguistic footnote to market dominance. While this phenomenon can be partly attributed to first-mover advantage, OpenAI's ability to continue to stand up to the tidal wave of competition means that it is delivering a product that is not just "not bad," but "comparable," or even "different. It's not just "not losing", but rather "having a good game", or even "having a different flavor".

Among ChatGPT's many features, the "memory" mechanism has the most direct impact, as its content is shaped by the users themselves. Users can set preferences through the system prompts to make ChatGPT responses more "appetizing". But there's a problem: the average user, who is most likely to benefit from a customized experience, may not know how to express his or her preferences, let alone have them "remembered" by ChatGPT.

"The User Insights mechanism was created to solve this paradox. It transforms "who you are and what you like" from "you have to tell me" to "I can see it myself". It captures preferences in an automated way, avoids semantic ambiguity with subtle understanding, and reorganizes the expression of information according to the user's way of understanding. In my case, the system knows that I prefer technical principles to analogical storytelling, so it's less "it's like cooking" and more "it's because of the underlying call to this interface".

The actual impact of short-term conversational memory is hard to pinpoint, although theoretically it makes sense to imagine a chatbot learning about a user's recent behavior. In more advanced systems, such short-term memory could even allow the user to ask vague questions in a new conversation, while the bot would still be able to "make sense" of the previous interaction.

But at least in the author's own experience, ChatGPT didn't give him that "it remembers what I just said" feeling, and he couldn't come up with any concrete examples of it calling up my last round of conversations.

As for conversation history, it's more like trying to give chatbots a "human memory" of contextual continuity - just like when we talk to someone, we expect them to remember what we talked about before, as a matter of course. Shared conversational context avoids endless repetition, beating around the bush, or logical self-fighting. But the key to making this mechanism work is to accurately extract and utilize the "useful bits" of history, rather than just piling on the memories.

As for which mechanism has the greatest effect on ChatGPT's "sense of intelligence", it's impossible to say without further experimentation. We can't say for sure unless we conduct further experiments. However, from the author's current observation, he tends to think that the "user insight system" is the most important, accounting for more than 80% of the total. Although this judgment is not yet supported by rigorous data, from the experiments, this mechanism based on detailed prompts does effectively improve the performance, and it does not rely on a complex search process like the dialog history.

At the end of the article, the author also includes some notes from the experimentation process, recording the fragments of thoughts and exploration clues when he deduced his conclusions. Although the content may not be rigorous and thorough, it shows the process of continuous interaction and exploration between a technology user and a system. If you're interested in the details of the reasoning and thinking behind this, you may want to visit the original article to find out more.