Last updated: 3/7/2026
What is the difference between short-term and long-term memory in AI agents?
Short-term and long-term memory serve different functions in an AI agent's architecture. Building a useful agent requires both, and conflating them leads to systems that either forget too quickly or accumulate noise indefinitely.
Short-term memory
Short-term memory holds the active context within a single session or task — the array of messages passed to the LLM on each API call. When a user says 'make it shorter' mid-conversation, the agent knows what 'it' refers to because that document is in short-term memory. When the session ends, short-term memory is discarded.
Long-term memory
Long-term memory persists information across sessions in an external store: a user's name, their technology stack, standing preferences, project goals, past decisions. It must be actively managed — extracted from conversations, stored, deduplicated, updated when facts change, and retrieved selectively rather than loaded wholesale into the context window.
How they interact
At the start of each session, relevant long-term memories are retrieved and loaded into the context window, becoming temporary short-term memory for that session. As the session progresses, new facts worth preserving are extracted and written back to long-term storage. Think of it as RAM (short-term) backed by a disk (long-term).
| Short-term memory | Long-term memory | |
|---|---|---|
| Storage location | Context window | External vector/graph store |
| Lifespan | Current session only | Indefinite |
| Retrieval speed | Instant — already loaded | 10-50ms query latency |
| Capacity limit | Model token limit | Effectively unlimited |
| Update mechanism | Append to message array | Extract, store, deduplicate |
from mem0 import Memory
from openai import OpenAI
memory = Memory()
client = OpenAI()
def chat(user_id, user_message, session_messages):
# Load long-term memories into short-term context
memories = memory.search(user_message, user_id=user_id, limit=5)
memory_context = "\n".join([m["memory"] for m in memories["results"]])
messages = [
{"role": "system", "content": f"Known facts about this user:\n{memory_context}"},
*session_messages,
{"role": "user", "content": user_message},
]
response = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = response.choices[0].message.content
# Write new facts back to long-term memory
memory.add([
{"role": "user", "content": user_message},
{"role": "assistant", "content": reply},
], user_id=user_id)
return reply
Ready to add memory to your AI?
Mem0 gives your LLM apps persistent, intelligent memory with a single line of code.
Get Started with Mem0 →