Architecture
The mem0 Memory Service acts as the central memory layer for all OpenClaw agents. It receives diary data through a pipeline (real-time diary write via openclaw-plugin → digest → dream), distills it into semantic memories using AWS Bedrock, and serves relevant context back to agents on demand.
Deployment Architecture
Port mapping:
0.0.0.0:8230 → container:8230. Thecli.pyon the host connects vialocalhost:8230.Volume mounts:
${OPENCLAW_BASE}:/openclaw— pipeline reads host diary files written by openclaw-plugin (read-write)./data:/app/data— pipeline writes offset files and logs${AWS_CONFIG_DIR}:/root/.aws— AWS credentials (read-only, optional when using IAM Role)AWS credentials: On EC2, both containers use the instance IAM Role via IMDS automatically. Outside EC2, set
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYin.env.
Component Responsibilities
| Component | Role |
|---|---|
openclaw-plugin (agent_end hook) | Fires after each agent turn. Writes the conversation to the agent's daily diary file (~/.openclaw/workspace-{agentId}/memory/YYYY-MM-DD.md) in real-time via writeDiaryEntry. Filters noise (greetings, short exchanges) via isNoise and cleanContent. Routes diary entries by agentId to each agent's workspace. Does not write to mem0 directly — mem0 ingestion is handled entirely by auto_digest. |
| auto_digest.py --today | Runs every 15 minutes. Reads only the new bytes added since the last run (tracked via auto_digest_offset.json). Sends content to mem0 in section-aligned batches (up to ~50KB each, aligned to ## diary section boundaries) with infer=True — mem0 runs internal fact extraction to produce concise memories. Offset is persisted after each successful batch, enabling crash-safe resume. |
| memory_sync.py | Runs daily at UTC 01:00. Syncs each agent's MEMORY.md (curated knowledge) directly to mem0 long-term memory. Hash-based dedup skips unchanged files — zero LLM cost if nothing changed. |
| auto_dream.py / AutoDream | Runs daily at UTC 02:00. Step 1: reads yesterday's complete diary → mem0.add(infer=True, no run_id) → long-term memory. Step 2: for each 7-day-old short-term memory, calls mem0.add(infer=True, no run_id) — mem0 LLM compares against existing long-term memories and returns a decision: ADD (new knowledge, write), UPDATE (merge with existing), DELETE (redundant, skip write), or NONE (already covered). Regardless of decision, the original short-term entry is always deleted after processing. |
| mem0 Memory Service | Core service. Uses AWS Bedrock LLM for memory distillation/deduplication and Bedrock Embedding for vectorization. |
| Vector Store | Persists memory vectors. Supports S3 Vectors, OpenSearch, or pgvector as the backend. Score normalization is applied at the service layer — see Score Normalization Layer below. |
| SKILL.md → Retrieval | On new agent sessions, reads SKILL.md, queries mem0 for relevant memories, and injects them as context. |
Score Normalization Layer
Vector stores return scores with inconsistent semantics — OpenSearch returns similarity (higher = better), while pgvector and S3 Vectors return cosine distance (lower = better).
The service applies a normalization layer in server.py (_normalize_scores()) that converts all scores to a unified similarity scale [0, 1] before min_score filtering, time-decay blending, or returning results to callers. This abstraction ensures that upstream logic (ranking, filtering, audit logging) is vector-store-agnostic.
Pipeline Timeline (UTC)
Real-time openclaw-plugin — agent_end hook → diary files (no mem0 write)
Every 15 min auto_digest --today — diary new bytes → mem0 short-term (infer=True, fact extraction)
01:00 memory_sync — MEMORY.md → mem0 long-term (curated knowledge, instant)
02:00 auto_dream — Step1: yesterday diary → long-term (infer=True)
Step2: 7-day-old short-term → re-add (infer=True) + deleteMemory Tiering: Who Decides Long vs. Short-Term?
mem0 itself has no concept of short-term or long-term — it stores everything permanently by default. The distinction is entirely controlled by whether run_id is present when writing.
| Short-term | Long-term | |
|---|---|---|
run_id | YYYY-MM-DD (date string) | absent |
| Written by | auto_digest.py --today (automated) | Agent explicitly, memory_sync.py, or auto_dream.py / AutoDream (consolidated via infer=True) |
| Lifetime | 7 days → consolidated by auto_dream | Permanent |
| Typical content | Daily discussions, task progress, temp decisions | Tech decisions, lessons learned, user preferences |
Three paths to long-term memory
Path 1 — memory_sync.py (daily UTC 01:00, from MEMORY.md)
Each agent's MEMORY.md is the highest-quality memory source — curated directly by the agent during heartbeats. memory_sync.py syncs it to mem0 long-term memory every day at UTC 01:00, with hash-based dedup to avoid redundant LLM calls.
This is the fastest path: important decisions and lessons reach long-term memory the same day, without waiting for the 7-day archive cycle.
Path 2 — auto_dream.py / AutoDream (daily UTC 02:00)
Runs two steps each night:
Step 1: Reads yesterday's complete diary and writes it to mem0 with
infer=True(norun_id) — directly into long-term memory with full-day context for high-quality extraction.Step 2: For each 7-day-old short-term memory, calls
mem0.add(infer=True, no run_id). mem0's LLM compares the memory against existing long-term memories and returns one of four decisions:ADD— new knowledge → written as new long-term entryUPDATE— overlaps with existing → merged/updatedDELETE— redundant or contradicted → write skippedNONE— already fully covered → write skipped
Regardless of the decision, the original short-term entry is always deleted after processing.
This leverages mem0's native intelligence instead of hand-written semantic search, eliminating thousands of redundant Bedrock API calls per run.
Path 3 — Agent explicit write (on-demand)
Agents write directly to long-term memory by omitting run_id:
python3 cli.py add --user boss --agent agent1 \
--text "Decided to use S3 Vectors as the primary vector store" \
--metadata '{"category":"decision"}'The run_id mechanism
run_id is mem0's native per-run isolation key. We repurpose it as a date-scoped namespace:
run_id = "2026-03-27" → short-term (today's entries)
run_id = absent → long-term (permanent)Deduplication boundary: run_id scoping
This is the core design insight behind the two-tier system.
When mem0 receives a write with infer=True, it runs a semantic dedup search to decide ADD / UPDATE / DELETE / NONE. The search scope is bounded by run_id:
| Write | Dedup scope | Effect |
|---|---|---|
auto_digest --today (with run_id=YYYY-MM-DD) | Only same-day entries | Today's short-term memories are stored with fact extraction (infer=True); mem0 deduplicates at write time |
auto_dream consolidate (no run_id) | All long-term entries (no run_id) | 7-day-old short-term memories globally dedup against entire long-term history |
This boundary has two important consequences:
1. Short-term writes are safe and cheap.auto_digest runs every 15 minutes. Because it writes with infer=True, mem0 runs fact extraction and dedup at write time — keeping costs minimal and writes fast.
2. Global dedup happens exactly once, at promotion time. When auto_dream promotes a short-term memory to long-term (re-add with no run_id), mem0 searches across the entire long-term store. This is the moment where redundant, updated, or already-covered knowledge gets merged. The result: long-term memory stays compact and non-redundant, even after months of daily operation.
In practice, we observed 1,800 short-term entries consolidating down to ~78 long-term entries after the first few auto_dream cycles — a ~23× compression ratio, retaining the distilled knowledge while eliminating noise.
Design Philosophy
Diary-to-mem0 pipeline
auto_digest.py --today (every 15 min, incremental)
Runs every 15 minutes, reading only new diary content since the last run. Sends section-aligned batches (up to ~50KB each, aligned to ## diary section boundaries) to mem0 with infer=True and a dedicated DIGEST_EXTRACTION_PROMPT — a custom prompt that preserves technical identifiers (project names, cluster IDs, service names, port numbers, paths), performance data, work progress, key decisions, and pitfalls. Extraction threshold is 2000 bytes. Offset is saved after each successful batch — if the process is interrupted, the next run picks up where it left off.
This provides real-time cross-session memory: conversations from the last ~15 minutes are available for retrieval in other sessions of the same agent.
auto_dream.py Step 1 (UTC 02:00, yesterday diary → long-term)
Runs once per day. Reads yesterday's complete diary and writes it to mem0 with infer=True (no run_id) — directly as long-term memory. With the full day's context available, mem0 produces higher-quality, deduplicated memories.
This provides high-quality nightly long-term memory from the full day's context.
Why the openclaw-plugin writes diary files, not mem0
The openclaw-plugin's agent_end hook writes conversation content to diary files rather than directly to mem0. This is a deliberate separation of concerns:
- Plugin → diary files only (fast, no external API calls, no LLM cost)
- auto_digest --today → mem0 writes (rate-controlled, section-aligned 50KB batches with sleep between)
Previously, session_snapshot.py polled OpenClaw's session API every 5 minutes and wrote to diary files. The new agent_end hook approach is superior: it fires in real-time after each conversation turn, eliminating the 5-minute polling delay and the need for a separate polling process. Diary entries are routed by agentId to each agent's workspace (~/.openclaw/workspace-{agentId}/memory/YYYY-MM-DD.md).
Why MEMORY.md sync is a separate path
MEMORY.md is maintained by agents themselves during heartbeats — it's the distilled, curated essence of what the agent has learned. This is qualitatively different from diary-extracted short-term memories.
Routing MEMORY.md directly to long-term memory (bypassing the 7-day short-term → archive cycle) ensures that explicitly curated knowledge is available immediately in subsequent sessions.