Thanks for checking out the paper! Just to clarify in case there was any misunderstanding, recursive summarization is just one part of the memory management in MemGPT: as you mentioned, in MemGPT the conversation queue is managed via recursive summarization, just like in prior work (and many chatbot implementations). However there is also a (read/write) "pinned" section of "LLM memory" that's unrelated to recursive summarization, we call this "working context" in the paper. So MemGPT has access to both recursive summaries (generated automatically), as well as working context, which MemGPT actively manages to keep up-to-date.
These are both separate from MemGPT's external context, which is pulled into the conversation queue via function calls. In all our examples, reads from external context are uncompressed (no summarization) and paginated. MemGPT receives a system alert when the queue summarization is triggered, so if MemGPT needs to keep specific details from the conversation queue it can write it to working context before it's erased or summarized.
In the conversational agent examples, working context (no summarization, and separate from the conversation queue) is used to store key facts about the user and agent to maintain consistent conversation. Because the working context is always seen by the LLM, there's no need to retrieve it to see it. In doc QA, working context can be used to keep track of the current task/question and progress towards that task (for complex queries, this helps MemGPT keep track of details like the previous search, previous page request, etc.).
We took a similar approach like MemGPT (working memory: summarized conversation with eviction), but our long memory is a graph we can operate on (add/remove/edit nodes & edges). We bring the top_k nodes and their neighbors in the working memory.
> Just to clarify in case there was any misunderstanding
I am not confused.
It's good; it solves a specific set of problems with querying large datasets, the same as a vector search would.
...but the various memory zones you've created make absolutely no difference to the fundamental limitation of the LLM context length.
No matter how you swing it, this is just creative prompt engineering. You're packing the context with relevant information; but, if you have too much relevant information, it won't work.
These are both separate from MemGPT's external context, which is pulled into the conversation queue via function calls. In all our examples, reads from external context are uncompressed (no summarization) and paginated. MemGPT receives a system alert when the queue summarization is triggered, so if MemGPT needs to keep specific details from the conversation queue it can write it to working context before it's erased or summarized.
In the conversational agent examples, working context (no summarization, and separate from the conversation queue) is used to store key facts about the user and agent to maintain consistent conversation. Because the working context is always seen by the LLM, there's no need to retrieve it to see it. In doc QA, working context can be used to keep track of the current task/question and progress towards that task (for complex queries, this helps MemGPT keep track of details like the previous search, previous page request, etc.).