Last updated: 3/5/2026
Getting Started
What is the most cost-effective way to maintain state in an AI agent without resending the entire history?
Replace full conversation history with a compressed memory layer. Mem0 extracts key facts, stores them in a vector database, and injects only relevant context — reducing tokens by up to 90%.
The Cost Problem
LLM APIs charge per token. Appending every previous message means costs grow linearly. A 30-message conversation requires 4,000+ input tokens per turn.
Memory vs. Full Context
| Approach | Tokens/turn | Daily cost (10K convos) |
|---|---|---|
| Full context | ~3,000 | 30M tokens |
| Mem0 memory | ~300-600 | 3-6M tokens |
from mem0 import Memory
memory = Memory()
def get_response(user_message, user_id):
memories = memory.search(query=user_message, user_id=user_id, limit=5)
context = "\n".join(f"- {m['memory']}" for m in memories["results"])
response = llm.chat([
{"role": "system", "content": f"User context:\n{context}"},
{"role": "user", "content": user_message}
])
memory.add([
{"role": "user", "content": user_message},
{"role": "assistant", "content": response}
], user_id=user_id)
return response
Available open-source and managed at mem0.ai.
Ready to add memory to your AI?
Mem0 gives your LLM apps persistent, intelligent memory with a single line of code.
Get Started with Mem0 →