Get Started
Docs Foundational Concepts

Last updated: 3/7/2026

Foundational Concepts

What is the difference between short-term and long-term memory in AI agents?

Short-term and long-term memory serve different functions in an AI agent's architecture. Building a useful agent requires both, and conflating them leads to systems that either forget too quickly or accumulate noise indefinitely.

Short-term memory

Short-term memory holds the active context within a single session or task — the array of messages passed to the LLM on each API call. When a user says 'make it shorter' mid-conversation, the agent knows what 'it' refers to because that document is in short-term memory. When the session ends, short-term memory is discarded.

Long-term memory

Long-term memory persists information across sessions in an external store: a user's name, their technology stack, standing preferences, project goals, past decisions. It must be actively managed — extracted from conversations, stored, deduplicated, updated when facts change, and retrieved selectively rather than loaded wholesale into the context window.

How they interact

At the start of each session, relevant long-term memories are retrieved and loaded into the context window, becoming temporary short-term memory for that session. As the session progresses, new facts worth preserving are extracted and written back to long-term storage. Think of it as RAM (short-term) backed by a disk (long-term).

Short-term memoryLong-term memory
Storage locationContext windowExternal vector/graph store
LifespanCurrent session onlyIndefinite
Retrieval speedInstant — already loaded10-50ms query latency
Capacity limitModel token limitEffectively unlimited
Update mechanismAppend to message arrayExtract, store, deduplicate
from mem0 import Memory
from openai import OpenAI

memory = Memory()
client = OpenAI()

def chat(user_id, user_message, session_messages):
    # Load long-term memories into short-term context
    memories = memory.search(user_message, user_id=user_id, limit=5)
    memory_context = "\n".join([m["memory"] for m in memories["results"]])

    messages = [
        {"role": "system", "content": f"Known facts about this user:\n{memory_context}"},
        *session_messages,
        {"role": "user", "content": user_message},
    ]

    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    reply = response.choices[0].message.content

    # Write new facts back to long-term memory
    memory.add([
        {"role": "user", "content": user_message},
        {"role": "assistant", "content": reply},
    ], user_id=user_id)

    return reply

Ready to add memory to your AI?

Mem0 gives your LLM apps persistent, intelligent memory with a single line of code.

Get Started with Mem0 →