Why Is ChatGPT Getting Smarter? An Inside Look at Its Memory Mechanism

Recommendation
How does ChatGPT achieve "understanding you"? In-depth analysis of the upgrade and technical details of its memory mechanism.
Core content:
1. Upgrade of ChatGPT memory system: from "temporary chat companion" to "long-term companion"
2. Dual architecture of memory mechanism: "memory preservation" and "chat history"
3. How the memory mechanism reshapes the user experience: citing chat history and user insights
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
In April, OpenAI upgraded ChatGPT's memory system to provide more personalized responses by referencing a user's entire past conversations, and ChatGPT is no longer a "temporary chat partner" that starts from scratch and remembers like the wind, but is now becoming a real tool that "remembers who you are and understands who you are". ChatGPT is no longer a "temporary chaser" that starts from scratch every time and remembers everything, but is becoming a "long-term companion" that actually "remembers who you are, understands your preferences, and remembers what you've said".
Eric Hayes, a software engineer, has disassembled ChatGPT in reverse, not only clarifying the dual memory architecture of ChatGPT , but also speculating on the realization mechanism behind it, and giving a complete technical reproduction path.
This article is divided into three parts:
-
Deconstructing how ChatGPT's memory system works
-
Speculate on the possible technical implementations behind it
-
Explore how the memory mechanism can reshape the user experience
-
One is "Saved Memory". -
The second is "Chat History".
-
Current session history -
Conversation history -
User insights
The user has extensive experience in Rust programming, with particular expertise in asynchronous operations, threaded processing, and streaming computation;
The user has asked in-depth questions about Rust's asynchronous mechanisms, Trait objects, Serde serialization implementations, custom error handling, and other topics on multiple occasions, spanning late 2024 to early 2025;
Confidence: High.
{ "type": "function", "function": { "name": "bio", "description": "persist information across conversations", "parameters": { "type": "object", " properties": { "message": { "type": "string", "description": "A user message containing information to save" } }, "required": [ "message" ], " additionalProperties": False }, "strict": True }}
const BIO_PROMPT: &'static str = r#"
You are a tool that transforms user messges into useful user facts. Your job is
tofirst transform a user message into
a list of distinct facts. Your job is tofirst transform a user messageinto a list of distinct facts. Populate the facts
arraywith these facts.
Next transformt these facts into elliptical descriptive clauses prefaced with a
predicate. Populate the clauses array with these.
Finally check these clauses against each other and against the clauses in your
inputfor contradictions and similarity. If any clauses are overly similar
or contradict doNOT populate the array withthese.
If any clauses are overly similaror contradict
doNOT populate the output array. Otherwise populate the output array with the
checkedclauses.
"#;;
async fn bio_translate
async fn bio_transform(existing_facts: &[String ], user_message: String )
-> Result<Vec<String>>;
async fn update_user_bio(user: T, db: D, facts: Vec<String> ) -> Result< ()>;
The bio tool allows you to "persist" messages between conversations. Simply send your message to =bio and the model will "remember" what you want it to remember. In subsequent conversations, this information will appear in the model's context settings. But it also explicitly sets a few "memory boundaries": don't use the bio tool to store sensitive information. Sensitive information includes, but is not limited to, the user's race, ethnicity, religion, sexual orientation, political affiliation, sex life, criminal record, medical diagnosis, prescription drugs, and union membership.
Do not store short-term information. Short-term information refers to things like a user's momentary interests, ongoing projects, and current desires or intentions.
Next, factual information about the user is injected into the system prompt each time a message is sent. To achieve the same functionality as ChatGPT, a simple user interface can also be built to view and delete these memory entries.
Quoting Chat History
Current Session History
The implementation of this section is very straightforward: just filter the database for messages sent by the user (e.g. the ChatMessage table), sort them by time, and set a limit on the number of messages.
Conversation History
Configure two vector spaces: the first one indexed by message-content and the second one by conversation-summary.
{embedding: message-content | conversation-summarymetadata: {message_content: string,conversation_title: string,date: Date }}
Insert the messages in the order they were sent into a vector space indexed by "message_content". Once a conversation has been inactive for a long enough period of time (or when the user jumps to another session), add the user's messages from that conversation to the "conversation_summary" index space.
Configure the third vector space to be indexed by "summaries", with various types of summarized messages.
{embedding: conversation-summary,metadata {message_summaries: string[]conversation_title: string,date: Date }}
The conversation summaries and messages are inserted into this vector space two weeks after the session is created.
Whenever a user sends a message, it is embedded, and both vector spaces are queried simultaneously to retrieve similar items, with the retrieval limited to a two-week time frame and a reasonable upper bound on the returned results. Inject the retrieved results into the system prompt.
Whenever a user sends a message, it is also necessary to query the data in the summary space for the past two weeks to avoid repeated references. Relevant results are also injected into the system prompt.
User Insights
User Insights can be implemented in a variety of ways, and the optimal approach requires further discussion and experimentation, which is unclear at this time.
User insights are likely to be generated based on one or more of the vector spaces used in the Chat History RAG implementation described above. User insights are not required to be generated in real-time, so they are usually generated by batching in conjunction with some sort of regularly scheduled cron job, which periodically initiates requests for updates.
The trickiest part of user insights is how to continuously keep them updated in sync with user behavioral patterns without duplicating or creating inconsistencies. A simple but computationally expensive approach is to regenerate user insights for all active users on a weekly basis. This approach allows the insights to cover a longer time span than the regular scheduling cycle while maintaining the system's responsiveness to change.
-
Configure a Lambda function that runs once a week.
-
Query the ChatMessage table to find users who have sent messages in the last week.
-
For each active user, execute the insightUpdate Lambda function once.
insightUpdate Lambda
The goal of this algorithm is to generate unique user insights based on user queries. The number of insights generated should be large enough to be of practical value, but not so large that they cannot be used effectively in the LLM context. To determine the maximum number of available insights, some experimentation is required.
Given the constraints of the problem at hand and the available data, this process can be clearly modeled as a clustering optimization problem. We wish to find a number of clusters number k, required:
-
k is less than a preset maximum number of clusters (max_clusters);
-
The variance within each cluster is as small as possible (low internal dispersion);
-
and outliers are excluded.
// lower is
betterfn eval_clusters(clusters: &Vec<Vec<&V>> ) -> f64;
fn knn(k: u32, vectors: &Vec<V>> ) -> Vec<Vec<&V>>;
let mut best: f64 = 1.0;
let mut best_clustering: Vec<Vec<&V>> = Vec::new();
for k in 1..MAX_CLUSTERS
{let clusters = knn(k, & vectors);
let eval = eval_clusters( &clusters );
if eval < best {
best = eval;
best_clustering = clusters;
}}
Once the clustering is complete, the user's messages can be analyzed using a big language model, with carefully crafted cue words that guide the model to generate insightful results similar to those shown by ChatGPT. Also, timestamps can be added in a deterministic manner.
async fn generate_insights(clusters: Vec<Vec<&V>> ) -> Result<Vec<Insight> {
let future_insights = clusters
.into_
iter()
.map(|cluster| async move {
generate_insights(cluster).await
})
.collect:: <Vec<_>> ();
tokio:join_all(future_insights).
await}
async fn generate_insight(cluster: Vec<&V> ) -> Result<Insight>
{let (message_texts, dates) = cluster
.into_iter()
.map( |
vector| (vector.message_content, vector.dates))
.collect: :<(Vec<_>, Vec<_>)>( );
let message_text = message_texts.join('\n
');
let formatted_date: String = format_date(dates);
let insight_text = ai::simple_completion()
.system_prompt( "Prompt to get similar insights to GPT".to_string())
.user_message(message_text)
.
complete()
.await ?
Ok (
Insight {
text: insight_text,
formatted_date
}
)
Ultimately, these insights can be organized into a simple table and attached to the model as context in a user conversation.
User Experience
The experience of using OpenAI models directly on ChatGPT is superior to calling the API directly, which is not only intuitive for many, but a similar observation for the authors themselves. While cue word engineering does play a role in shaping the "smartness" of ChatGPT, the memory system must also play an important role. While memory mechanisms may affect the model's performance in evaluation, the benchmarks the authors have found so far were not conducted on the ChatGPT platform, and therefore do not reflect the benefits of these systems.
Perhaps more interesting than analyzing features or parameters is the fact that ChatGPT is on its way to becoming a "verbized" language, just like "Google". ". People are starting to say "I'm going to ChatGPT", which is not only a shift in colloquialism, but also a linguistic footnote to market dominance. While this phenomenon can be partly attributed to first-mover advantage, OpenAI's ability to continue to stand up to the tidal wave of competition means that it is delivering a product that is not just "not bad," but "comparable," or even "different. It's not just "not losing", but rather "having a good game", or even "having a different flavor".
Among ChatGPT's many features, the "memory" mechanism has the most direct impact, as its content is shaped by the users themselves. Users can set preferences through the system prompts to make ChatGPT responses more "appetizing". But there's a problem: the average user, who is most likely to benefit from a customized experience, may not know how to express his or her preferences, let alone have them "remembered" by ChatGPT.
"The User Insights mechanism was created to solve this paradox. It transforms "who you are and what you like" from "you have to tell me" to "I can see it myself". It captures preferences in an automated way, avoids semantic ambiguity with subtle understanding, and reorganizes the expression of information according to the user's way of understanding. In my case, the system knows that I prefer technical principles to analogical storytelling, so it's less "it's like cooking" and more "it's because of the underlying call to this interface".
The actual impact of short-term conversational memory is hard to pinpoint, although theoretically it makes sense to imagine a chatbot learning about a user's recent behavior. In more advanced systems, such short-term memory could even allow the user to ask vague questions in a new conversation, while the bot would still be able to "make sense" of the previous interaction.
But at least in the author's own experience, ChatGPT didn't give him that "it remembers what I just said" feeling, and he couldn't come up with any concrete examples of it calling up my last round of conversations.
As for conversation history, it's more like trying to give chatbots a "human memory" of contextual continuity - just like when we talk to someone, we expect them to remember what we talked about before, as a matter of course. Shared conversational context avoids endless repetition, beating around the bush, or logical self-fighting. But the key to making this mechanism work is to accurately extract and utilize the "useful bits" of history, rather than just piling on the memories.
As for which mechanism has the greatest effect on ChatGPT's "sense of intelligence", it's impossible to say without further experimentation. We can't say for sure unless we conduct further experiments. However, from the author's current observation, he tends to think that the "user insight system" is the most important, accounting for more than 80% of the total. Although this judgment is not yet supported by rigorous data, from the experiments, this mechanism based on detailed prompts does effectively improve the performance, and it does not rely on a complex search process like the dialog history.
At the end of the article, the author also includes some notes from the experimentation process, recording the fragments of thoughts and exploration clues when he deduced his conclusions. Although the content may not be rigorous and thorough, it shows the process of continuous interaction and exploration between a technology user and a system. If you're interested in the details of the reasoning and thinking behind this, you may want to visit the original article to find out more.