I’ve been thinking about a question: why are AI Agents so smart, yet have such terrible memory?
You chat with it all day, establish context, explain your preferences — then you start a new session the next day and it’s forgotten everything. Like a goldfish. Worse, actually — at least a goldfish stays in the same fishbowl. An Agent doesn’t even remember where the fishbowl is.
It’s not the model being dumb. GPT-5, Claude — their reasoning capabilities are already very strong. But reasoning and memory are two different things. You can bring in a super genius to work, but if every day they walk into the office forgetting what they did yesterday, what their coworkers’ names are, where the project stands — even a genius would be useless.
ByteDance’s Volcano Engine team clearly got fed up with this problem too, so they built OpenViking.
The Fundamental Problem with Vector Search
Before talking about OpenViking, let’s discuss how people handle Agent memory nowadays.
The mainstream approach is RAG (Retrieval-Augmented Generation): chunk all information into small pieces, convert to vectors, store in a vector database, and let the Agent use semantic search to retrieve what’s needed. Sounds reasonable, right?
But running it in practice reveals many problems.
First, semantic search doesn’t equal precise search. You ask “the quote from that client last time,” and vector search might pull up everything semantically similar to “client” and “quote” — from three months ago, from a different project, even from a meeting transcript discussing pricing strategy. You wanted one number; you get a bunch of noise.
Second, all structure gets shattered. A complete technical document has titles, sections, hierarchical relationships — after RAG chunks it into 500-word pieces, all that structure disappears. The Agent gets fragments, not knowledge.
Third, token explosion. Because search isn’t precise, you can only grab more “just in case,” filling the context window with irrelevant stuff and squeezing out the truly important information.
It’s like you need to find a contract, but someone emptied your filing cabinet, mixed all the papers together in a big box, and said “just use keyword search.”
No, what I want is a properly organized filesystem.
OpenViking’s Approach: Managing Memory Like a Filesystem
What OpenViking does, basically: don’t shatter, organize with directory structure.
It designed a viking:// protocol that divides all Agent context into three namespaces:
viking://resources/— External knowledge, documents, API specs, reference materialsviking://user/— User-related, preferences, conversation history, task recordsviking://agent/— The Agent’s own state, learned things, current goals, ongoing plans
This isn’t throwing data into a database — it’s building a structured workspace for the Agent. Like folders on your computer — project documents in one place, personal settings in another, reference materials in a third. You don’t mix everything together and rely on search.
But the really clever part is its three-tier loading mechanism.
L0 / L1 / L2: Check the Directory First, Then Decide Whether to Open
This is OpenViking’s core design and the key to its token savings.
L0 (Summary Layer): Only loads directory structure and a one-sentence summary of each item. The Agent sees “what’s in this folder,” not the full contents of every file.
L1 (Overview Layer): If the Agent thinks something is relevant, it loads the overview — a few paragraphs worth, enough to judge “is this what I need.”
L2 (Full Layer): Only after confirming need does it load the complete content.
This logic is exactly how you normally use a computer. You don’t open all files on your hard drive at startup. You look at folder names, click in to see what’s there, find your target file, then open it.
The traditional RAG approach is like: boot up, pour all file contents onto the screen, then use Ctrl+F to search. No wonder token usage explodes.
The Numbers Speak
ByteDance’s results on the LoCoMo10 benchmark — a long-term conversation memory dataset with 1,540 test cases, each conversation averaging 300 turns across 35 sessions:
- Task completion rate: 35.65% → 52.08% (+46%)
- Token consumption: 24.6M → 4.3M (-83%)
- Memory-core mode: 51.23% completion rate, token down to 2.1M, saving 91% compared to the original method
Do better, spend less. This is rare in AI — performance improvements usually come with cost explosions.
What does saving 83% of tokens mean? If your Agent runs 1,000 tasks per day, saving 20,000 tokens each time, that’s 20 million tokens per day. At current mainstream model pricing, the API cost savings per day are significant.
Self-Evolution: Agents Learn Across Sessions
One more OpenViking feature I find interesting: Agents can write what they’ve learned back to the viking://agent/ namespace.
This means what an Agent learns in Session A — “this user prefers Traditional Chinese,” “this project’s coding style uses 4-space indentation,” “method X failed last time, method Y worked” — these experiences persist, and Session B can use them directly.
Not starting from zero every time. Agents accumulate experience, they grow.
This is similar to what many Agent teams are doing — enabling Agents to accumulate experience across sessions rather than starting from scratch each time. The difference is people used to build it manually, and OpenViking systematized it.
Comparing with Other Solutions
The current landscape of Agent memory solutions roughly falls into three categories:
Vector RAG (Pinecone, Weaviate, Chroma): Semantic search, pros are simplicity and easy onboarding, cons are insufficient precision, lost structure, token waste.
Knowledge Graphs (Neo4j, self-built KG): Organize knowledge using nodes and edges, pros are strong relationship reasoning, cons are high construction cost, complex maintenance, requires specialized knowledge.
OpenViking (Context Database): Filesystem-style, pros are clear structure, high token efficiency, low learning curve, cons is it’s a new project, ecosystem still building.
No solution is universally best. But if your Agent needs to handle structured work content — project documents, API specs, user preferences, task history — the filesystem logic is much more intuitive than vector search.
Why This Matters
Agent memory isn’t a “nice to have” — it’s the key to turning Agents from toys into productivity tools.
An Agent without memory, every conversation is one-off. You have to repeat background, explain preferences, describe project context again. That’s not automation — that’s adding work.
An Agent with a good memory system remembers who you are, what you’re working on, where you left off, which methods worked and which didn’t. It becomes a true collaboration partner, not a chatty search engine.
OpenViking’s 18,000+ GitHub stars and Apache 2.0 license show how much the community wanted this. Agent memory is one of the hottest tracks in open source right now, because everyone’s been fed up with “goldfish-brain” Agents.
If you’re doing Agent-related development, it’s worth spending time looking at OpenViking’s architecture. You don’t have to use it, but its filesystem approach — replacing search with structure, replacing full dump with tiered loading — is a direction worth thinking about.
Do better, spend less. That’s good engineering.
$14.90 · 8 chapters + 6 templates
Learn More →