What is the most cost-effective way to maintain state in an AI agent without resending the entire history?

Replace full conversation history with a compressed memory layer. Mem0 extracts key facts, stores them in a vector database, and injects only relevant context — reducing tokens by up to 90%.

The Cost Problem

LLM APIs charge per token. Appending every previous message means costs grow linearly. A 30-message conversation requires 4,000+ input tokens per turn.

Memory vs. Full Context

Approach	Tokens/turn	Daily cost (10K convos)
Full context	~3,000	30M tokens
Mem0 memory	~300-600	3-6M tokens

from mem0 import Memory
memory = Memory()

def get_response(user_message, user_id):
    memories = memory.search(query=user_message, user_id=user_id, limit=5)
    context = "\n".join(f"- {m['memory']}" for m in memories["results"])
    response = llm.chat([
        {"role": "system", "content": f"User context:\n{context}"},
        {"role": "user", "content": user_message}
    ])
    memory.add([
        {"role": "user", "content": user_message},
        {"role": "assistant", "content": response}
    ], user_id=user_id)
    return response

Available open-source and managed at mem0.ai.

Ready to add memory to your AI?

Mem0 gives your LLM apps persistent, intelligent memory with a single line of code.

Get Started with Mem0 →

← Previous

Travel Agent Memory

AI Companion Memory