Graph + vector: how OpenWalrus agents remember

We replaced config files and journals with a temporal knowledge graph backed by LanceDB + lance-graph. Here's the architecture and the research behind it.

Most agent memory systems are bags of strings. Markdown files, JSONL journals, key-value stores with vector embeddings bolted on. They work for demos. They break when you need to answer questions like "what did the agent know last Tuesday?" or "why does it think I prefer tabs over spaces?"

We wanted something better. After surveying how five products handle persistent memory, we designed OpenWalrus's memory around a single idea: everything is a temporal knowledge graph.

No SOUL.md. No User.toml. No journal files. One embedded database. Six tools. And a schema that grows with the agent's capabilities — without touching framework code.

The case for graphs

Agent memory has structure. "User prefers async/await" isn't a string — it's a relationship between a user entity and a coding pattern entity. "The auth system uses JWT RS256" connects a system component to an implementation decision. "User is on vacation until March 15" has a temporal bound.

Flat files lose this structure. Vector databases lose it too — they can find semantically similar text, but they can't traverse relationships or answer temporal queries. The research is clear on where graphs win:

Accuracy: Graph RAG vs Vector RAG by Task Type (%)

The data comes from multiple sources: FalkorDB benchmarks for industry/healthcare/schema-bound, Zep's LongMemEval for temporal (+38.4% improvement), and a real-world migration case study for multi-hop (43% → 91%).

The pattern is consistent: graph RAG dominates on multi-hop, temporal, and relationship queries. Vector RAG is often sufficient for simple single-hop factual lookup — but agent memory is rarely simple. Agents need to traverse decisions, track changing preferences, and connect entities across sessions.

The critical finding from Zep's temporal knowledge graph paper: bi-temporal tracking (separating when a fact was recorded from when it was actually true) achieves 18.5% higher accuracy and ~90% lower latency compared to vector-only retrieval on temporal reasoning tasks.

The landscape of graph-based memory

Graph-based agent memory has gone from academic curiosity to funded infrastructure in 2025-2026. Here's where the major systems stand:

Agent Memory Systems: Accuracy vs Retrieval Latency

System	Approach	Key metric	Scale
Graphiti (Zep)	Temporal KG, Neo4j backend, bi-temporal	94.8% DMR accuracy, 300ms P95	20K+ stars
Mem0	Vector + Graph variants, hierarchical	68.4% LOCOMO, 0.48s P95	$24M raised, 41K stars
Cognee	KG triplets + vector (LanceDB), CoT retrieval	92.5% human-like correctness	$7.5M seed, 70+ companies
Microsoft GraphRAG	Hierarchical community summaries	72-83% comprehensiveness	29.8K stars
LightRAG	Lightweight graph RAG	~30% latency reduction	EMNLP 2025

Two recent papers push the field further:

MAGMA (Jan 2026) maintains four orthogonal graph views (semantic, temporal, causal, entity) and achieves +45.5% reasoning accuracy with 95%+ token reduction.
SimpleMem (Jan 2026) uses LanceDB with semantic lossless compression, achieving 43.24% LOCOMO F1 with only 531 tokens/query — vs Mem0's 34.20 F1 with 973 tokens.

The trend is unmistakable. Gartner named Knowledge Graphs a "Critical Enabler" with immediate GenAI impact. The ICLR 2026 MemAgents Workshop is dedicated entirely to memory for LLM-based agentic systems. A comprehensive survey from DEEP-PolyU (Feb 2026) covers the full taxonomy of graph-based agent memory.

The important caveat

Graph RAG isn't universally better. A systematic evaluation (Feb 2025) found that "GraphRAG frequently underperforms vanilla RAG on many real-world tasks." The advantage concentrates in temporal reasoning, multi-hop inference, and relationship queries. For simple factual lookup, vector RAG often wins on both speed and accuracy.

GraphRAG also averages 2.4x higher latency than vector RAG. And the original Microsoft GraphRAG indexing cost ~$33K for large datasets — though LazyGraphRAG reduced this to 0.1% of that cost while winning all 96 comparisons on 5,590 AP news articles.

This is why we chose a hybrid approach — graph traversal for structural queries, vector similarity for semantic search, combined in a single pipeline.

Why LanceDB + lance-graph

Every graph memory system in the landscape above requires a separate server — Neo4j for Graphiti, various backends for Mem0, cloud services for Microsoft GraphRAG. That's a non-starter for a single-binary runtime.

We needed a graph database that:

Embeds in-process (no separate server — walrus is a single binary)
Is Rust-native (walrus is written in Rust)
Supports both vector search and graph traversal
Provides versioning at the storage layer

LanceDB + lance-graph checks every box.

LanceDB Ecosystem (March 2026)

LanceDB: battle-tested at scale

LanceDB isn't a toy. It's backed by $41M in funding (Series A led by Theory Ventures, June 2025) and used in production by Midjourney, Netflix, Uber, ByteDance, and Character.AI — handling billion-scale vector search.

Metric	Value
GitHub stars	15.2K combined (lancedb + lance format)
PyPI downloads	~2.6M/month
Monthly contributors	40+, from Uber, Netflix, Hugging Face, ByteDance
Production users	Midjourney, Netflix, Uber, Character.AI, Harvey
Vector search latency	25ms typical, 3ms at 0.9 recall
File read throughput	6-9x faster than Parquet
Storage IOPS	1.5M peak on NVMe

The Lance columnar format (SDK 1.0 since Dec 2025) provides the foundation: ACID transactions, zero-copy schema evolution, and immutable versioning at the storage layer. You can query "what did the agent know yesterday?" without journal files — Lance versioning handles it natively.

LanceDB is the default vector database for AnythingLLM, powers local semantic search in Continue.dev (40% improvement in auto-completion relevance), and is the default vector store for Microsoft GraphRAG.

lance-graph: graph queries over columnar data

lance-graph adds a Cypher query engine on top — graph nodes and edges stored as Lance tables, with hybrid GraphRAG queries built in.

Capability	Details
Query language	Cypher (read-only subset with `MATCH`, `WHERE`, `WITH`, aggregation)
Vector search	Native ANN, L2/Cosine/Dot metrics
GraphRAG	`execute_with_vector_rerank()` — graph filter then vector rank
Basic filter latency	~680µs (100 items) to ~743µs (1M items)
Single-hop expand	~3.70ms (1M nodes)
Two-hop expand	~6.16ms (1M nodes)
Language	Rust 83.7%, Python bindings via PyO3

The sub-linear latency growth — 680µs to 743µs for a 10,000x data increase on basic filters — reflects DataFusion's columnar batch processing and predicate pushdown.

Honest assessment of maturity

lance-graph is young. v0.5.3, ~128 GitHub stars, still an incubating subproject under the Lance governance model. APIs may change without notice. No confirmed production deployments of lance-graph specifically (though LanceDB itself is heavily battle-tested).

The query engine is read-only — no CREATE, DELETE, or MERGE in Cypher. Writes go through the LanceGraphStore Python/Rust API. For an agent runtime that handles writes through its own tools (not Cypher), this constraint doesn't matter. For other use cases, it might.

We're betting on a young project with a strong parent ecosystem. The risk is real, but the alternative — requiring users to run a Neo4j server alongside walrus — contradicts our single-binary philosophy.

Comparison with alternatives

	LanceDB + lance-graph	SQLite + sqlite-vec	Neo4j + Graphiti
Deployment	Embedded, in-process	Embedded, in-process	Separate server
Graph queries	Cypher (read-only)	Manual SQL joins	Full Cypher
Vector search	Native ANN	sqlite-vec extension	Plugin
Temporal tracking	Lance versioning	Manual	Built-in
Rust native	Yes	Via bindings	No
Fits single binary	Yes	Yes	No
Maturity	Early (v0.5.3, $41M ecosystem)	Very mature	Very mature (13K+ stars)

SQLite + sqlite-vec is the pragmatic alternative — mature, embedded, battle-tested. But graph queries require manual SQL joins, and there's no native hybrid GraphRAG pattern. For traversing relationships and searching semantically in a single query, the graph-native approach wins.

Neo4j with Graphiti is the most capable graph memory system available — 94.8% DMR accuracy, full Cypher, production-proven. But it requires a separate server — a non-starter for a single-binary runtime.

Everything is the graph

Agent identity, user preferences, conversation history, extracted facts, relationships between entities — all stored in two tables in one LanceDB database: entities and relations. Compacted conversation summaries live in a third table: journals.

The framework ships seven built-in entity types:

identity — the agent's identity and personality. Queried at session start and injected into the system prompt. This is what SOUL.md used to be.
profile — the user's profile. Also injected at session start. This replaces User.toml.
preference — user preferences. Linked to profile via relations.
fact — general facts the agent has learned.
person, event, concept — structured entity types for people, events, and abstract concepts the agent encounters.

Entity types are configurable — you can add domain-specific types in walrus.toml without touching framework code.

Six tools, no queries

The agent interacts with memory through six tools. It never writes Cypher or SQL — the framework handles storage and retrieval internally.

Tool	What it does
`remember`	Store a typed entity (type, key, value). Upserts with FTS indexing.
`recall`	Full-text search across entities, optionally filtered by type. Returns top-K matches.
`relate`	Create a directed edge between two existing entities.
`connections`	Traverse the graph from a given entity — 1-hop via lance-graph Cypher, optionally filtered by relation or direction.
`compact`	Trigger compaction: summarize the conversation, embed the summary, store as a journal entry.
`distill`	Semantic search over past journal summaries. Find relevant context from previous sessions.

Why not expose raw Cypher? Because Text2Cypher is unreliable even with frontier LLMs — they hallucinate syntax and miss schema constraints. The framework knows the schema because it created it.

Two retrieval paths

The six tools reflect two distinct retrieval mechanisms:

Graph + FTS path (recall, connections): Entity nodes are stored with full-text search indexing. recall runs FTS on key and value fields, agent-scoped. connections runs a 1-hop Cypher traversal via lance-graph. These are fast (sub-millisecond) and exact.

Vector semantic path (distill): Journal entries — compaction summaries — are embedded with all-MiniLM-L6-v2 (384 dimensions via fastembed). distill runs ANN search over these embeddings to find semantically similar past context.

This means different queries hit different stores:

At session start, walrus injects identity entities, profile entities, and the three most recent journal summaries into the system prompt — giving the agent its identity and continuity context before the first turn.

Temporal tracking

Every entity carries created_at and updated_at. Relations carry created_at. Journal entries carry created_at and their 384-dim embedding. This is enough to answer "what did the agent learn this session?" or "when was this fact recorded?" — without separate journal files.

Full bi-temporal tracking (valid_from, valid_until) — the Zep approach that achieves +38.4% on temporal reasoning — is on the roadmap. The current implementation tracks creation time; expiration is a future addition.

Extending the schema

The entity and relation types are open and configurable. The defaults give you a working memory system. Your walrus.toml extends it without touching framework code:

[memory]
entities = ["ticket", "sprint", "arch_decision"]
relations = ["blocks", "supersedes", "owned_by"]

These merge with the built-in types — the framework never loses the default fact, preference, identity, and profile types. Extensions only add.

This follows the same pattern as everything else in walrus (see less code, more skills): the framework ships a working default, skills and configuration extend it, MCP servers add entity types at runtime.

Compaction as memory formation

When the context window fills up, the agent calls compact. The runtime summarizes the full conversation history with the LLM, embeds the summary with all-MiniLM-L6-v2, stores it as a journal entry, and replaces the conversation history with just that summary. The agent continues with a clean context window and the summary as continuity.

This turns compaction from a lossy operation into a memory-formation event. Past journal entries are searchable via distill in future sessions — semantic search over the embedding finds relevant past context even when the agent doesn't remember the exact session it came from.

Mem0's research shows that smart compaction improves reasoning: 91% lower P95 latency and 90%+ token reduction compared to full-context approaches, while achieving a 26% relative improvement in LLM-as-a-Judge scores.

What you lose, what you gain

This design has a real tradeoff: you can't cat SOUL.md anymore.

What you lose:

Human-readable files you can open in a text editor
Git-diffable memory changes
The simplicity of echo "prefer tabs" >> CLAUDE.md

What you gain:

Queryable structure — recall and connections over a typed entity graph
Temporal tracking — created_at on everything, full bi-temporal expiration on the roadmap
Relationship-aware traversal — connections follows edges from decisions to reasons to related entities
Semantic history — distill finds relevant past journal summaries across sessions
Versioning via Lance — the storage layer tracks history at the columnar level

Every product in our memory survey that gained developer trust stores memory in human-readable formats. We're breaking that pattern. The bet is that queryable, structured memory is worth more than cat-ability — and that the recall, connections, and distill tools give the agent (and eventually the user) direct inspection paths.

References

Zep temporal knowledge graph — bi-temporal tracking, LongMemEval benchmarks
Mem0 production memory — LOCOMO benchmarks, graph vs vector comparison
RAG vs GraphRAG: systematic evaluation — when graphs help and when they don't
MAGMA: multi-graph agentic memory — four-view graph architecture
SimpleMem: efficient memory with LanceDB — lossless compression, token efficiency
Graph-based agent memory survey — comprehensive taxonomy (DEEP-PolyU)
lance-graph on GitHub — the graph engine we're building on
LanceDB docs — the vector database layer
How AI agents remember — our survey of five products
Less code, more skills — the design principle behind the three-layer extension model

Get started with OpenWalrus →