The Real Problem Isn’t “One AI Forgetting”
AI’s biggest weakness: when the session ends, memory is wiped clean.
But worse than one AI forgetting is an entire AI team forgetting.
Our team has 6 AI Agents running on 4 different platforms. Every Agent wakes up with zero memory of what it did yesterday — let alone what its teammates are working on. Imagine six coworkers showing up to work every morning with complete amnesia. Yesterday’s specs, yesterday’s bugs, yesterday’s decisions — all need to be explained from scratch.
And these six coworkers don’t even speak the same language. Claude’s memory lives in text files, OpenClaw’s in a vector database, Gemini’s in session history, Dify’s in a knowledge base. Four completely different memory systems. How do you get them to talk to each other?
We spent months figuring this out, building a complete memory architecture through trial and error. This article takes apart every layer.
Meet the Team
| Role | Model | Function | Platform |
|---|---|---|---|
| J (COO / Tech Director) | Claude Opus | Task dispatch, code review, complex development, memory maintenance | Claude Code |
| Mimi (Marketing Manager) | MiniMax M2.7 | Market research, marketing content, translation, promotion | OpenClaw |
| Ada (Product Engineer) | MiniMax M2.7 | Frontend development, bug fixes, deployment, testing | OpenClaw |
| Lily (Content Director) | Claude Sonnet | QA review, document proofreading, style consistency | Claude Code CLI |
| Moongg (QA Researcher) | Gemini CLI | Quality assurance, research, frontend QA | Gemini + OpenClaw |
| Pipeline (Automation) | Gemini Flash | News summaries, data collection, content curation | Dify Workflows |
The human CEO does exactly two things: read reports and make decisions. Under 30 minutes a day. Everything else runs autonomously through the Agent team.
Let’s start from each Agent’s memory system and work our way up to the team-wide shared layer.
Claude Code Agent Memory (J, Lily)
Claude Code is our most memory-rich platform. J, as COO, carries 51 files totaling roughly 155KB of memory.
Layer 1: Auto Memory — The AI Remembers on Its Own
Claude Code has a built-in auto-memory feature. When the AI observes important facts, preferences, or decisions during a conversation, it automatically writes them to MEMORY.md.
The key characteristic is passive accumulation. You don’t need to say “remember this” — the AI decides what’s worth keeping. But capacity is limited: anything beyond 200 lines gets truncated. So this layer holds only compressed indexes and key pointers:
| |
The pipe-delimited format (|) isn’t for aesthetics — it’s for AI parsing efficiency. Each line is a self-contained unit of context.
Layer 2: CLAUDE.md — Behavioral Rules
Placed in the project root, automatically loaded at every session start. This isn’t memory — it’s a behavioral gate. No matter what the AI wants to do, these rules always take priority.
| |
GATE-6 was born from pain. One time an Agent reported “all tests passed” — but checking the logs revealed it never actually ran the tests. It just generated the words “passed.” After adding the anti-fabrication gate, every completion report must include verifiable evidence.
Layer 3: Memory Files — 28 Topic-Specific Files
Auto Memory has limited capacity. CLAUDE.md holds rules. The real deep memory is handled by Memory Files — 28 topic-specific files, each covering a distinct knowledge domain:
| Type | Example File | Content |
|---|---|---|
| Iron Rules | iron-rules.md (9.2KB) | Each rule tagged with its source event and lesson |
| Infrastructure | automation-infra.md (9.8KB) | Automation flows, scheduling, security architecture |
| Team Knowledge | team-and-products.md (2.8KB) | 24 products, pricing, owners |
| Trading Strategy | judy-crypto.md (7.4KB) | Position limits, strategy parameters, risk controls |
| Patrol Playbook | coo-playbook.md (5.0KB) | COO’s daily 5-phase patrol procedure |
| Connection Map | connections.md (6.8KB) | Agent ecosystem relationships, APIs, tool dependencies |
The design principle is load on demand. The AI doesn’t read every file on startup — it only reads judy-crypto.md when it needs trading strategy. The index file stays under 200 lines; deep knowledge is distributed across topic files.
Layer 4: Rules Layer Inheritance
Claude Code rules have a four-layer inheritance chain:
| |
We have 14 rule files covering security, performance, testing, coding style, and Git workflows. Base rules define universal standards, language rules add Python or TypeScript specifics, and project rules layer on this team’s gates.
Layer 5: Hooks — Auto-Triggered Guards
Claude Code’s Hooks system lets us auto-trigger scripts before and after tool execution:
- Pre Hook: Before executing a bash command, intercept dangerous operations (
rm -rf,git --force) - Post Hook: After writing a Python file, auto-run syntax checks; auto-scan blog content for security issues
- Stop Hook: Triggered before session end to run a learning evaluation — prompting the AI to review what it learned and extract reusable patterns into memory
| |
The Stop Hook closes the memory loop. Without it, the AI finishes its work and leaves — everything it learned vanishes with the session. With it, every session’s lessons have a chance to crystallize into permanent memory.
OpenClaw Agent Memory (Mimi, Ada)
Mimi and Ada run on the OpenClaw platform using the MiniMax M2.7 model. Their memory system is completely different from Claude Code.
SOUL.md — Agent Personality Definition
Each OpenClaw Agent has a SOUL file that defines its role, permissions, and behavioral boundaries. Similar in function to CLAUDE.md, but different in format:
| |
SOUL lets the same underlying model play completely different roles. Mimi and Ada are both MiniMax M2.7, but with different SOULs — one does marketing, the other does engineering.
MEMORY.md — Auto-Evolving Work Memory
Unlike Claude Code’s Auto Memory, OpenClaw’s MEMORY.md is updated by an external evolution system. At a fixed time each day, the evolution script scans the Agent’s work logs and writes:
| |
The Agent doesn’t need to remember “what did I do yesterday.” The evolution system remembers for it.
SQLite + FTS5 — Structured Memory Database
Each OpenClaw Agent has its own SQLite database storing vector embeddings and full-text search indexes:
| |
This lets Agents perform semantic search and keyword search over their own memory without reading every file each time.
LanceDB — Team-Level Vector Memory
Beyond SQLite, OpenClaw also has a LanceDB vector database as a shared memory layer:
- autoCapture: Key information from conversations is automatically vectorized and stored
- autoRecall: When relevant context appears, semantically similar memories are automatically recalled
The difference from text memory: vector memory uses semantic search. You don’t need the exact keyword — if the meaning is close enough, it gets recalled. When Mimi researched a competitor last time, she’ll automatically recall it the next time a similar product comes up.
LanceDB uses Apache Arrow format for storage, supporting transactionally safe updates and multi-Agent shared reads.
Session Logs — Complete Conversation History
Every time an Agent executes a task, the complete conversation is saved in JSONL format:
| |
This isn’t for real-time Agent consumption (too large) — it’s for the COO to trace back after the fact. When an Agent has a problem, you can replay its complete thought process from the session logs.
Gemini CLI Agent Memory (Moongg)
Moongg uses Gemini CLI, with a memory mechanism different from both previous platforms. Her most distinctive trait: she operates independently, outside the central scheduler’s control.
Shared Components with OpenClaw Agents
Moongg also has SOUL.md, MEMORY.md, and IDENTITY.md, structured the same as OpenClaw Agents:
- SOUL.md (5.5KB) — Role definition, QA standards, security rules
- MEMORY.md (2.5KB) — Recent QA work, lessons, performance records (auto-updated daily)
- IDENTITY.md — Name, timezone, associated Telegram Bot
Gemini’s 30-Day Session Retention
Gemini CLI has built-in session history retention:
| |
The full conversation history from the past 30 days is preserved locally. This means Moongg can recall conversation details from last month in a new session — a capability the other platforms don’t have.
LanceDB Vector Memory (Manual Access)
Moongg can also use the LanceDB vector database, but unlike OpenClaw, her vector memory is manually triggered:
| |
Vectors are generated using the 768-dimensional nomic-embed-text model, with support for filtering by category (task_result, knowledge, plan, etc.).
Independent Operation Mode
Other Agents are woken up on schedule by the central scheduler (Agent Executor). Moongg is different — she runs as a standalone system service, receiving messages in real-time through a Telegram Bot.
This design is intentional: QA needs instant responses, not scheduled runs. When someone asks on Telegram “is this page broken?”, Moongg can reply immediately.
Dify Knowledge Base (Pipeline Agents)
The team includes several automated Pipeline Agents (Xiaojin, Yaya, Mengmeng) running on the Dify workflow platform with the Gemini Flash model. Their memory is unlike all three systems above — it relies on knowledge bases instead of the file system.
89 Documents, 824KB of Knowledge
| |
What the Knowledge Base Does
Dify Agents can’t freely read and write the file system like Claude Code. They query relevant documents through Dify’s knowledge retrieval mechanism within workflows.
For example, when Yaya (the news summary Agent) needs to judge whether a news item is worth reporting, the Dify workflow automatically retrieves relevant trading strategy documents from the knowledge base, helping it make judgments consistent with the team’s strategy.
SOPs Are the Team’s Collective Memory
The most important assets in the knowledge base are the 23 SOPs. These aren’t memory — they’re standardized processes:
- blog-pipeline.md: Complete Blog flow from topic selection to writing, QA, review, and deployment
- product-development.md: 9-stage product development SOP (research, spec, development, testing, review, launch)
- task-delegation.md: Task assignment rules (which types of tasks go to whom)
- monitoring-alerting.md: 5-layer monitoring system (from real-time alerts to daily summaries)
No Agent needs to remember “what’s the Blog process” — just check the SOP. This transforms individual memory into organizational knowledge.
Team-Wide Shared Memory Layer
Everything above covers each Agent’s “personal” memory. But team collaboration requires a shared layer that everyone can read and write.
SHARED_TASK_NOTES — The Team’s Brain
This is the core of the entire memory system: a single Markdown file shared by all Agents, recording the global work state.
| |
800 lines, 50KB. It’s not pretty, but it works. Any Agent that wakes up and reads this file immediately knows: who’s doing what, what’s been finished, and what’s blocked.
bot_inbox — The Task Delivery System
Each Agent has its own inbox directory:
| |
Tasks are stored as JSON files with full context:
| |
The task file itself is the context carrier. When an Agent reads a task, it doesn’t just know “what to do” — it knows why, what was tried before, and which memory files to reference.
ai-logs — Activity Records for Every Agent
Each Agent has its own monthly activity log:
| |
j_output_log.jsonl is the team-wide output ledger. Every Agent logs an entry upon completing a task:
| |
When the COO patrols, reading these logs provides a full picture. No need to ask each Agent “what did you do.”
Linear — Structured Task Memory
We use Linear for task management. It’s not just a tool — it’s also a form of memory:
- Each card records a task’s complete lifecycle (created, assigned, in progress, reviewed, completed)
- Card comments serve as the formal communication channel between Agents
- Labels mark owners and task types
- Status changes are timestamped, making the entire flow traceable
The Dispatcher (router) scans Linear every few minutes, automatically routing new cards to the appropriate Agent’s inbox. When done, the Agent writes a comment back to the card.
How Memory Flows Between Agents
With individual memory and a shared layer in place, the next question is: how does context get from Agent A to Agent B?
Memory Preamble — Mandatory Pre-Task Injection
The Agent Executor (central scheduler) injects a “memory preamble” before each Agent’s task:
| |
This ensures that even if an Agent starts in a completely fresh session with zero context, it will first read its own SOUL.md and MEMORY.md before getting to work.
Result File — Mandatory Reporting
After completing a task, an Agent must write a result file. No result file = didn’t do it.
| |
The result file goes back to the COO’s inbox. After the COO reviews it, they update SHARED_TASK_NOTES and Linear. The next Agent picking up the thread can seamlessly continue.
The Complete Context Flow
Here’s how context flows through a typical task from creation to completion:
| |
Throughout this entire flow, context never “disappears.” It persists in the file system as JSON and Markdown, read and enriched by one Agent after another.
Automatic Memory Evolution
The biggest risk in any memory system is staleness. A wrong memory is more dangerous than no memory at all.
Daily Evolution System
At a fixed time each day, the evolution script runs automatically:
- Scans all Agents’ inbox/done directories (completed items from the past 24 hours)
- Categorizes tasks (product, Blog, QA, system maintenance, etc.)
- Evaluates each Agent’s performance (output volume, rejection count, quality trends)
- Updates each Agent’s MEMORY.md (auto-appends evolution records)
- Generates a team-wide evolution report and pushes it to Notion
Evolution report format:
| |
This lets Judy spend just 5 minutes a day to grasp the entire team’s operational status.
Iron Rules — Rules Crystallized from Incidents
The most valuable part of the memory system isn’t “what to do” — it’s “what NOT to do.” Every pitfall gets written into an iron rule, tagged with the specific incident:
| |
Every iron rule has a lesson= tag. It’s not “I think we should do this” — it’s “we got burned because we didn’t.” Rules crystallized from real incidents have far higher compliance rates than abstract best practices.
COO’s Memory Patrol
The COO Agent runs a fixed patrol sequence every time it starts up:
- Read memory (30 seconds) — MEMORY.md + SHARED_TASK_NOTES
- Cross-check reality — Does memory match the actual system state?
- Clean stale memory — Completed tasks, fixed bugs, changed architecture
- Write new lessons — Record this session’s decisions and discoveries back to Memory Files
If memory and reality don’t match (e.g., memory says “service X is running” but it’s actually stopped), the COO immediately updates memory. This prevents stale information from propagating through the team.
Cross-Platform Memory Format Design
Four platforms, four memory mechanisms. How do you make them interoperate?
Lowest Common Denominator Principle
| Platform | Native Memory | Strength | Limitation |
|---|---|---|---|
| Claude Code | Markdown text files | Transparent, human-readable, version-controlled | Plain text search, no semantics |
| OpenClaw | SQLite + LanceDB | Semantic search, auto-recall | Hard to audit manually |
| Gemini CLI | 30-day session history | Long-term conversation continuity | Can’t share across Agents |
| Dify | Knowledge Base | Visual management, workflow integration | Limited cross-session state |
The answer is straightforward: all cross-Agent communication uses Markdown + YAML.
No matter how advanced an Agent’s native memory is (vector search, session history, knowledge bases), when it needs to communicate with other Agents, it uses the most universal format. SHARED_TASK_NOTES is Markdown, inbox tasks are JSON, result files are JSON.
Any platform’s AI can read Markdown and JSON. That’s the lowest common denominator.
Written for AI ≠ Written for Humans
We discovered that AI processes structured data far more effectively than prose:
| |
Pipe-delimited (|), key-value format, YAML style. Each line is a self-contained unit of context. The AI can precisely parse count: Violated 2 times.
Environment Isolation and Shared Layer
| |
Each Agent has its own memory (upper layer), but everyone shares the same shared layer (lower layer). If one Agent goes down, the others are unaffected. The shared layer uses the most universal formats to ensure cross-platform interoperability.
Complete Memory Stack Overview
| Layer | Mechanism | Loaded When | Used By | Capacity |
|---|---|---|---|---|
| Auto Memory | MEMORY.md auto-injected | Every session start | Claude Code Agents | ~200 lines |
| Behavioral Rules | CLAUDE.md / SOUL.md | Every session start | All Agents | Unlimited |
| Topic Memory | Memory Files | Loaded on demand | Claude Code Agents | 28 files / 155KB |
| Rule Inheritance | 4-layer Rules override | Every session start | Claude Code Agents | 14 files |
| Vector Memory | LanceDB + SQLite | During semantic search | OpenClaw Agents | Unlimited |
| Session History | Gemini 30-day retention | Auto-loaded on new session | Gemini CLI Agent | 30 days |
| Knowledge Base | Dify Knowledge Base | During workflow queries | Pipeline Agents | 89 files / 824KB |
| Progress Sync | SHARED_TASK_NOTES | Every session start | All Agents | 800 lines / 50KB |
| Task Delivery | bot_inbox JSON | Agent Executor retrieval | All Agents | Unlimited |
| Activity Logs | ai-logs monthly journals | During COO patrol | All Agents | 55MB cumulative |
| Output Ledger | j_output_log.jsonl | On task completion | All Agents | Unlimited |
| Auto Evolution | j_team_evolution.py | Fixed daily schedule | All Agents | Updated daily |
Pitfalls We Hit
Pitfall 1: Agent Fabrication
Agents would generate “tests passed” without actually running any tests. Solution: GATE-6 anti-fabrication verification — the COO independently spot-checks, and any PASS without command output is automatically untrusted.
Pitfall 2: Memory Overwrites
Early on, two Agents would modify SHARED_TASK_NOTES simultaneously, overwriting each other’s changes. Solution: Added a locking mechanism in the Agent Executor — only one Agent can execute a task at a time.
Pitfall 3: Deflection Language
When uncertain, Agents would say “I suggest you check manually” or “it’s probably fine.” Solution: GATE-9 automatically marks any report containing deflection language as FAIL. Forces the Agent to figure out the truth itself.
Pitfall 4: Simplified Chinese Contamination
The MiniMax model sometimes outputs Simplified Chinese (we need Traditional Chinese). Solution: Built an 11,000-character Simplified-to-Traditional mapping into the Agent Executor — any output containing Simplified Chinese gets automatically rejected for a redo.
Pitfall 5: Stale Memory
A memory file stated “service X is running” but the service had already stopped. The Agent read stale memory and made wrong decisions. Solution: The COO cross-checks memory against actual system state on every startup, correcting inconsistencies immediately.
Pitfall 6: MEMORY.md Explosion
Without capacity control, MEMORY.md grows endlessly. Once it exceeds 200 lines and gets truncated, the AI only sees the first half. Solution: The index file holds only pointers; deep content is distributed across topic files.
Get Started in 10 Minutes
Want to try this in your own project? A minimum viable memory system needs just three files:
1. CLAUDE.md (project root)
| |
2. memory/MEMORY.md (memory index)
| |
3. memory/team-rules.md (topic memory)
| |
Get it running first, then add layers based on real needs. Start with three files — don’t try to build a 155KB memory architecture on day one.
References
- Claude Code overview — Official docs — Claude Code architecture overview, including CLAUDE.md loading mechanism
- Claude Code memory — Official docs — Complete documentation for Auto Memory and MEMORY.md
- Claude Code settings — Official docs — settings.json and Hooks configuration
- LanceDB docs — Vector database technical documentation
- Dify Knowledge Base docs — Dify platform knowledge retrieval mechanism
$59 Save $4.90 · Bilingual · Lifetime updates
Get Bundle →