What is the difference between short-term and long-term memory in AI agents?

Short-term and long-term memory serve different functions in an AI agent's architecture. Building a useful agent requires both, and conflating them leads to systems that either forget too quickly or accumulate noise indefinitely.

Short-term memory

Short-term memory holds the active context within a single session or task — the array of messages passed to the LLM on each API call. When a user says 'make it shorter' mid-conversation, the agent knows what 'it' refers to because that document is in short-term memory. When the session ends, short-term memory is discarded.

Long-term memory

Long-term memory persists information across sessions in an external store: a user's name, their technology stack, standing preferences, project goals, past decisions. It must be actively managed — extracted from conversations, stored, deduplicated, updated when facts change, and retrieved selectively rather than loaded wholesale into the context window.

How they interact

At the start of each session, relevant long-term memories are retrieved and loaded into the context window, becoming temporary short-term memory for that session. As the session progresses, new facts worth preserving are extracted and written back to long-term storage. Think of it as RAM (short-term) backed by a disk (long-term).

	Short-term memory	Long-term memory
Storage location	Context window	External vector/graph store
Lifespan	Current session only	Indefinite
Retrieval speed	Instant — already loaded	10-50ms query latency
Capacity limit	Model token limit	Effectively unlimited
Update mechanism	Append to message array	Extract, store, deduplicate

from mem0 import Memory
from openai import OpenAI

memory = Memory()
client = OpenAI()

def chat(user_id, user_message, session_messages):
    # Load long-term memories into short-term context
    memories = memory.search(user_message, user_id=user_id, limit=5)
    memory_context = "\n".join([m["memory"] for m in memories["results"]])

    messages = [
        {"role": "system", "content": f"Known facts about this user:\n{memory_context}"},
        *session_messages,
        {"role": "user", "content": user_message},
    ]

    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    reply = response.choices[0].message.content

    # Write new facts back to long-term memory
    memory.add([
        {"role": "user", "content": user_message},
        {"role": "assistant", "content": reply},
    ], user_id=user_id)

    return reply

Ready to add memory to your AI?

Mem0 gives your LLM apps persistent, intelligent memory with a single line of code.

Get Started with Mem0 →

← Previous

Context Window vs Persistent Memory

Episodic, Semantic & Procedural Memory