Graph + vector: how OpenWalrus agents remember
We replaced config files and journals with a temporal knowledge graph backed by LanceDB + lance-graph. Here's the architecture and the research behind it.
Most agent memory systems are bags of strings. Markdown files, JSONL journals, key-value stores with vector embeddings bolted on. They work for demos. They break when you need to answer questions like "what did the agent know last Tuesday?" or "why does it think I prefer tabs over spaces?"
We wanted something better. After surveying how five products handle persistent memory, we designed OpenWalrus's memory around a single idea: everything is a temporal knowledge graph.
No SOUL.md. No User.toml. No journal files. One embedded database. Six tools. And a schema that grows with the agent's capabilities — without touching framework code.
The case for graphs
Agent memory has structure. "User prefers async/await" isn't a string — it's a relationship between a user entity and a coding pattern entity. "The auth system uses JWT RS256" connects a system component to an implementation decision. "User is on vacation until March 15" has a temporal bound.
Flat files lose this structure. Vector databases lose it too — they can find semantically similar text, but they can't traverse relationships or answer temporal queries. The research is clear on where graphs win:
Accuracy: Graph RAG vs Vector RAG by Task Type (%)
The data comes from multiple sources: FalkorDB benchmarks for industry/healthcare/schema-bound, Zep's LongMemEval for temporal (+38.4% improvement), and a real-world migration case study for multi-hop (43% → 91%).
The pattern is consistent: graph RAG dominates on multi-hop, temporal, and relationship queries. Vector RAG is often sufficient for simple single-hop factual lookup — but agent memory is rarely simple. Agents need to traverse decisions, track changing preferences, and connect entities across sessions.
The critical finding from Zep's temporal knowledge graph paper: bi-temporal tracking (separating when a fact was recorded from when it was actually true) achieves 18.5% higher accuracy and ~90% lower latency compared to vector-only retrieval on temporal reasoning tasks.
The landscape of graph-based memory
Graph-based agent memory has gone from academic curiosity to funded infrastructure in 2025-2026. Here's where the major systems stand:
Agent Memory Systems: Accuracy vs Retrieval Latency
| System | Approach | Key metric | Scale |
|---|---|---|---|
| Graphiti (Zep) | Temporal KG, Neo4j backend, bi-temporal | 94.8% DMR accuracy, 300ms P95 | 20K+ stars |
| Mem0 | Vector + Graph variants, hierarchical | 68.4% LOCOMO, 0.48s P95 | $24M raised, 41K stars |
| Cognee | KG triplets + vector (LanceDB), CoT retrieval | 92.5% human-like correctness | $7.5M seed, 70+ companies |
| Microsoft GraphRAG | Hierarchical community summaries | 72-83% comprehensiveness | 29.8K stars |
| LightRAG | Lightweight graph RAG | ~30% latency reduction | EMNLP 2025 |
Two recent papers push the field further:
- MAGMA (Jan 2026) maintains four orthogonal graph views (semantic, temporal, causal, entity) and achieves +45.5% reasoning accuracy with 95%+ token reduction.
- SimpleMem (Jan 2026) uses LanceDB with semantic lossless compression, achieving 43.24% LOCOMO F1 with only 531 tokens/query — vs Mem0's 34.20 F1 with 973 tokens.
The trend is unmistakable. Gartner named Knowledge Graphs a "Critical Enabler" with immediate GenAI impact. The ICLR 2026 MemAgents Workshop is dedicated entirely to memory for LLM-based agentic systems. A comprehensive survey from DEEP-PolyU (Feb 2026) covers the full taxonomy of graph-based agent memory.
The important caveat
Graph RAG isn't universally better. A systematic evaluation (Feb 2025) found that "GraphRAG frequently underperforms vanilla RAG on many real-world tasks." The advantage concentrates in temporal reasoning, multi-hop inference, and relationship queries. For simple factual lookup, vector RAG often wins on both speed and accuracy.
GraphRAG also averages 2.4x higher latency than vector RAG. And the original Microsoft GraphRAG indexing cost ~$33K for large datasets — though LazyGraphRAG reduced this to 0.1% of that cost while winning all 96 comparisons on 5,590 AP news articles.
This is why we chose a hybrid approach — graph traversal for structural queries, vector similarity for semantic search, combined in a single pipeline.
Why LanceDB + lance-graph
Every graph memory system in the landscape above requires a separate server — Neo4j for Graphiti, various backends for Mem0, cloud services for Microsoft GraphRAG. That's a non-starter for a single-binary runtime.
We needed a graph database that:
- Embeds in-process (no separate server — walrus is a single binary)
- Is Rust-native (walrus is written in Rust)
- Supports both vector search and graph traversal
- Provides versioning at the storage layer
LanceDB + lance-graph checks every box.
LanceDB Ecosystem (March 2026)
LanceDB: battle-tested at scale
LanceDB isn't a toy. It's backed by $41M in funding (Series A led by Theory Ventures, June 2025) and used in production by Midjourney, Netflix, Uber, ByteDance, and Character.AI — handling billion-scale vector search.
| Metric | Value |
|---|---|
| GitHub stars | 15.2K combined (lancedb + lance format) |
| PyPI downloads | ~2.6M/month |
| Monthly contributors | 40+, from Uber, Netflix, Hugging Face, ByteDance |
| Production users | Midjourney, Netflix, Uber, Character.AI, Harvey |
| Vector search latency | 25ms typical, 3ms at 0.9 recall |
| File read throughput | 6-9x faster than Parquet |
| Storage IOPS | 1.5M peak on NVMe |
The Lance columnar format (SDK 1.0 since Dec 2025) provides the foundation: ACID transactions, zero-copy schema evolution, and immutable versioning at the storage layer. You can query "what did the agent know yesterday?" without journal files — Lance versioning handles it natively.
LanceDB is the default vector database for AnythingLLM, powers local semantic search in Continue.dev (40% improvement in auto-completion relevance), and is the default vector store for Microsoft GraphRAG.
lance-graph: graph queries over columnar data
lance-graph adds a Cypher query engine on top — graph nodes and edges stored as Lance tables, with hybrid GraphRAG queries built in.
| Capability | Details |
|---|---|
| Query language | Cypher (read-only subset with MATCH, WHERE, WITH, aggregation) |
| Vector search | Native ANN, L2/Cosine/Dot metrics |
| GraphRAG | execute_with_vector_rerank() — graph filter then vector rank |
| Basic filter latency | ~680µs (100 items) to ~743µs (1M items) |
| Single-hop expand | ~3.70ms (1M nodes) |
| Two-hop expand | ~6.16ms (1M nodes) |
| Language | Rust 83.7%, Python bindings via PyO3 |
The sub-linear latency growth — 680µs to 743µs for a 10,000x data increase on basic filters — reflects DataFusion's columnar batch processing and predicate pushdown.
Honest assessment of maturity
lance-graph is young. v0.5.3, ~128 GitHub stars, still an incubating subproject under the Lance governance model. APIs may change without notice. No confirmed production deployments of lance-graph specifically (though LanceDB itself is heavily battle-tested).
The query engine is read-only — no CREATE, DELETE, or MERGE in
Cypher. Writes go through the LanceGraphStore Python/Rust API. For
an agent runtime that handles writes through its own tools (not Cypher),
this constraint doesn't matter. For other use cases, it might.
We're betting on a young project with a strong parent ecosystem. The risk is real, but the alternative — requiring users to run a Neo4j server alongside walrus — contradicts our single-binary philosophy.
Comparison with alternatives
| LanceDB + lance-graph | SQLite + sqlite-vec | Neo4j + Graphiti | |
|---|---|---|---|
| Deployment | Embedded, in-process | Embedded, in-process | Separate server |
| Graph queries | Cypher (read-only) | Manual SQL joins | Full Cypher |
| Vector search | Native ANN | sqlite-vec extension | Plugin |
| Temporal tracking | Lance versioning | Manual | Built-in |
| Rust native | Yes | Via bindings | No |
| Fits single binary | Yes | Yes | No |
| Maturity | Early (v0.5.3, $41M ecosystem) | Very mature | Very mature (13K+ stars) |
SQLite + sqlite-vec is the pragmatic alternative — mature, embedded, battle-tested. But graph queries require manual SQL joins, and there's no native hybrid GraphRAG pattern. For traversing relationships and searching semantically in a single query, the graph-native approach wins.
Neo4j with Graphiti is the most capable graph memory system available — 94.8% DMR accuracy, full Cypher, production-proven. But it requires a separate server — a non-starter for a single-binary runtime.
Everything is the graph
Agent identity, user preferences, conversation history, extracted facts, relationships between entities — all stored in two tables in one LanceDB database: entities and relations. Compacted conversation summaries live in a third table: journals.
The framework ships seven built-in entity types:
identity— the agent's identity and personality. Queried at session start and injected into the system prompt. This is what SOUL.md used to be.profile— the user's profile. Also injected at session start. This replaces User.toml.preference— user preferences. Linked toprofilevia relations.fact— general facts the agent has learned.person,event,concept— structured entity types for people, events, and abstract concepts the agent encounters.
Entity types are configurable — you can add domain-specific types in
walrus.toml without touching framework code.
Six tools, no queries
The agent interacts with memory through six tools. It never writes Cypher or SQL — the framework handles storage and retrieval internally.
| Tool | What it does |
|---|---|
remember | Store a typed entity (type, key, value). Upserts with FTS indexing. |
recall | Full-text search across entities, optionally filtered by type. Returns top-K matches. |
relate | Create a directed edge between two existing entities. |
connections | Traverse the graph from a given entity — 1-hop via lance-graph Cypher, optionally filtered by relation or direction. |
compact | Trigger compaction: summarize the conversation, embed the summary, store as a journal entry. |
distill | Semantic search over past journal summaries. Find relevant context from previous sessions. |
Why not expose raw Cypher? Because Text2Cypher is unreliable even with frontier LLMs — they hallucinate syntax and miss schema constraints. The framework knows the schema because it created it.
Two retrieval paths
The six tools reflect two distinct retrieval mechanisms:
Graph + FTS path (recall, connections): Entity nodes are stored
with full-text search indexing. recall runs FTS on key and value
fields, agent-scoped. connections runs a 1-hop Cypher traversal via
lance-graph. These are fast (sub-millisecond) and exact.
Vector semantic path (distill): Journal entries — compaction
summaries — are embedded with all-MiniLM-L6-v2 (384 dimensions via
fastembed). distill runs ANN search over these embeddings to find
semantically similar past context.
This means different queries hit different stores:
At session start, walrus injects identity entities, profile entities, and the three most recent journal summaries into the system prompt — giving the agent its identity and continuity context before the first turn.
Temporal tracking
Every entity carries created_at and updated_at. Relations carry
created_at. Journal entries carry created_at and their 384-dim embedding.
This is enough to answer "what did the agent learn this session?" or "when
was this fact recorded?" — without separate journal files.
Full bi-temporal tracking (valid_from, valid_until) — the
Zep approach that achieves +38.4%
on temporal reasoning — is on the roadmap. The current implementation
tracks creation time; expiration is a future addition.
Extending the schema
The entity and relation types are open and configurable. The defaults give
you a working memory system. Your walrus.toml extends it without touching
framework code:
[memory]
entities = ["ticket", "sprint", "arch_decision"]
relations = ["blocks", "supersedes", "owned_by"]These merge with the built-in types — the framework never loses the default
fact, preference, identity, and profile types. Extensions only add.
This follows the same pattern as everything else in walrus (see less code, more skills): the framework ships a working default, skills and configuration extend it, MCP servers add entity types at runtime.
Compaction as memory formation
When the context window fills up, the agent calls compact. The runtime
summarizes the full conversation history with the LLM, embeds the summary
with all-MiniLM-L6-v2, stores it as a journal entry, and replaces the
conversation history with just that summary. The agent continues with a
clean context window and the summary as continuity.
This turns compaction from a lossy operation into a memory-formation event.
Past journal entries are searchable via distill in future sessions —
semantic search over the embedding finds relevant past context even when the
agent doesn't remember the exact session it came from.
Mem0's research shows that smart compaction improves reasoning: 91% lower P95 latency and 90%+ token reduction compared to full-context approaches, while achieving a 26% relative improvement in LLM-as-a-Judge scores.
What you lose, what you gain
This design has a real tradeoff: you can't cat SOUL.md anymore.
What you lose:
- Human-readable files you can open in a text editor
- Git-diffable memory changes
- The simplicity of
echo "prefer tabs" >> CLAUDE.md
What you gain:
- Queryable structure —
recallandconnectionsover a typed entity graph - Temporal tracking —
created_aton everything, full bi-temporal expiration on the roadmap - Relationship-aware traversal —
connectionsfollows edges from decisions to reasons to related entities - Semantic history —
distillfinds relevant past journal summaries across sessions - Versioning via Lance — the storage layer tracks history at the columnar level
Every product in our
memory survey that gained
developer trust stores memory in human-readable formats. We're breaking
that pattern. The bet is that queryable, structured memory is worth more
than cat-ability — and that the recall, connections, and distill
tools give the agent (and eventually the user) direct inspection paths.
References
- Zep temporal knowledge graph — bi-temporal tracking, LongMemEval benchmarks
- Mem0 production memory — LOCOMO benchmarks, graph vs vector comparison
- RAG vs GraphRAG: systematic evaluation — when graphs help and when they don't
- MAGMA: multi-graph agentic memory — four-view graph architecture
- SimpleMem: efficient memory with LanceDB — lossless compression, token efficiency
- Graph-based agent memory survey — comprehensive taxonomy (DEEP-PolyU)
- lance-graph on GitHub — the graph engine we're building on
- LanceDB docs — the vector database layer
- How AI agents remember — our survey of five products
- Less code, more skills — the design principle behind the three-layer extension model