About the memory function management problems and solutions in the application process of large models

Written by
Jasper Cole
Updated on:June-28th-2025
Recommendation

The challenges and practical solutions of large model memory management will give you a deep understanding of context maintenance in AI conversations.

Core content:
1. The necessity and challenges of memory management for large models in NLP tasks
2. Introduction to the memory management module under the Langchain framework
3. Innovative applications of session cache window memory and summary memory functions

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

"  Many functions of large model applications include memory management, which requires not only technical issues but also sufficient engineering capabilities to solve. "



As we all know, large models have no memory function, so memory management becomes an indispensable part of the application process of large models; although memory management sounds simple, there are still many problems in actual operation.


For example, as memory increases, token costs rise, large model windows are limited, memory storage issues, and so on.


Therefore, today we will learn about the memory function of the large model in detail from the actual operation of the project; the development framework is based on the memory management module of langchain.






Large model memory management issues




The problem of memory management of large models mainly exists in the dialogue scenarios of NLP tasks. Its essence is similar to the need for context when we ordinary people chat. For example, if your friends are chatting and you suddenly come over, you must first ask what they are talking about, or listen to what they are talking about from the side, so that you know how to start the conversation.


The big model has no memory function due to its own reasons, which means that every conversation is a brand new communication for the big model; it is like a person with amnesia who forgets what he just said as soon as he finished the previous sentence.


Therefore, the big model memory management came into being; memory management is also very simple, which is to record every conversation with the big model, what you said and what the big model said; then input this conversation record into the big model (joined to the prompt word) every time you chat, so that the big model can know what you talked about based on this context.



Some modules with memory functions are encapsulated in langchain, mainly memory-based session memory; of course, there are also storage middleware (relational database, redis cache, etc.) to solve distributed scenarios. In essence, it saves the conversation records, loads them out during each conversation, and then splices them into the prompt words.


For example, the ConversationBufferMemory memory, which is used more frequently , is actually a simple and crude way to save all conversation records. However, this also brings some problems. For example, as the number of conversations increases, the conversation records will become longer and longer, and the token consumption will increase. More importantly, the window size of the large model is limited, so if it continues to grow, it will lead to the problem of excessive length.



So, this is where the conversation buffer window memory ConversationBufferWindowMemory comes in. In essence, it adds a k value to the conversation. The purpose is to only record the most recent k rounds of conversation. For example, if you have had a hundred conversations, but I only record the most recent 10, this can solve both the cost problem and the problem of excessive length.


But this will also cause a new problem, that is, the loss of previous conversation records may lead to a lack of context; this is similar to two people drinking and bragging for two hours, and suddenly they talk about what they just talked about. At this time, since the conversation records have been lost, the large model cannot obtain the previous context.




Therefore, in order to solve this problem, a summary memory function was designed; its principle is to input the conversation record into another summary model and let this model summarize the key information according to the content of the conversation, so as to solve the problem of context loss.


However, since the large model needs to be accessed multiple times and the amount of summarized content increases as the number of conversations increases, the problem of context window limitation may still exist. Also, since the large model only focuses on the beginning and end of the text and does not process the content in the middle of the text very well, context loss may still occur.


Therefore, judging from the above problems, large model memory management is a complex process, and the appropriate memory management method needs to be selected according to different scenarios.


Moreover, in actual operation, it is impossible for only one person to access the entire system, so it is necessary to use user IDs or session IDs to distinguish the memory data of different users according to different users; but at the same time, because the conversation records are stored in memory or third-party storage middleware, but the conversation time is limited, it is also necessary to do memory lifecycle management, that is, after a round of user conversation is completed, the conversation records that are no longer needed can be cleaned up at an appropriate time.



As shown in the above code, a memory function is created for each user based on the session_id session id. This session id can select a suitable unique identifier according to different application scenarios. In addition, in the process of storing the session, a last_active time parameter is set. When the maximum session time is reached, the expired session records can be automatically cleaned up.




This can technically prevent memory leaks or insufficient storage space issues.


Of course, the above functions are only applicable to some application scenarios. For example, in a distributed environment, memory storage based on memory cannot be used, and external storage tools such as redis are required. Therefore, in specific applications, users can selectively use these memory tools, or they can completely develop a set of memory management tools by themselves.


In short, there are still certain differences between the learning process and actual application of large models; some problems need to be solved by technology itself, while some problems need to be avoided or solved in an engineering way.