macOS desktop app drops at 3K GitHub stars — Star us to unlock it →
← Back to blog

Graph + vector: how OpenWalrus agents remember

We replaced config files and journals with a temporal knowledge graph backed by LanceDB + lance-graph. Here's the architecture and the research behind it.

design·OpenWalrus Team·

Most agent memory systems are bags of strings. Markdown files, JSONL journals, key-value stores with vector embeddings bolted on. They work for demos. They break when you need to answer questions like "what did the agent know last Tuesday?" or "why does it think I prefer tabs over spaces?"

We wanted something better. After surveying how five products handle persistent memory, we designed OpenWalrus's memory around a single idea: everything is a temporal knowledge graph.

No SOUL.md. No User.toml. No journal files. One embedded database. Six tools. And a schema that grows with the agent's capabilities — without touching framework code.

The case for graphs

Agent memory has structure. "User prefers async/await" isn't a string — it's a relationship between a user entity and a coding pattern entity. "The auth system uses JWT RS256" connects a system component to an implementation decision. "User is on vacation until March 15" has a temporal bound.

Flat files lose this structure. Vector databases lose it too — they can find semantically similar text, but they can't traverse relationships or answer temporal queries. The research is clear on where graphs win:

Accuracy: Graph RAG vs Vector RAG by Task Type (%)

The data comes from multiple sources: FalkorDB benchmarks for industry/healthcare/schema-bound, Zep's LongMemEval for temporal (+38.4% improvement), and a real-world migration case study for multi-hop (43% → 91%).

The pattern is consistent: graph RAG dominates on multi-hop, temporal, and relationship queries. Vector RAG is often sufficient for simple single-hop factual lookup — but agent memory is rarely simple. Agents need to traverse decisions, track changing preferences, and connect entities across sessions.

The critical finding from Zep's temporal knowledge graph paper: bi-temporal tracking (separating when a fact was recorded from when it was actually true) achieves 18.5% higher accuracy and ~90% lower latency compared to vector-only retrieval on temporal reasoning tasks.

The landscape of graph-based memory

Graph-based agent memory has gone from academic curiosity to funded infrastructure in 2025-2026. Here's where the major systems stand:

Agent Memory Systems: Accuracy vs Retrieval Latency

SystemApproachKey metricScale
Graphiti (Zep)Temporal KG, Neo4j backend, bi-temporal94.8% DMR accuracy, 300ms P9520K+ stars
Mem0Vector + Graph variants, hierarchical68.4% LOCOMO, 0.48s P95$24M raised, 41K stars
CogneeKG triplets + vector (LanceDB), CoT retrieval92.5% human-like correctness$7.5M seed, 70+ companies
Microsoft GraphRAGHierarchical community summaries72-83% comprehensiveness29.8K stars
LightRAGLightweight graph RAG~30% latency reductionEMNLP 2025

Two recent papers push the field further:

  • MAGMA (Jan 2026) maintains four orthogonal graph views (semantic, temporal, causal, entity) and achieves +45.5% reasoning accuracy with 95%+ token reduction.
  • SimpleMem (Jan 2026) uses LanceDB with semantic lossless compression, achieving 43.24% LOCOMO F1 with only 531 tokens/query — vs Mem0's 34.20 F1 with 973 tokens.

The trend is unmistakable. Gartner named Knowledge Graphs a "Critical Enabler" with immediate GenAI impact. The ICLR 2026 MemAgents Workshop is dedicated entirely to memory for LLM-based agentic systems. A comprehensive survey from DEEP-PolyU (Feb 2026) covers the full taxonomy of graph-based agent memory.

The important caveat

Graph RAG isn't universally better. A systematic evaluation (Feb 2025) found that "GraphRAG frequently underperforms vanilla RAG on many real-world tasks." The advantage concentrates in temporal reasoning, multi-hop inference, and relationship queries. For simple factual lookup, vector RAG often wins on both speed and accuracy.

GraphRAG also averages 2.4x higher latency than vector RAG. And the original Microsoft GraphRAG indexing cost ~$33K for large datasets — though LazyGraphRAG reduced this to 0.1% of that cost while winning all 96 comparisons on 5,590 AP news articles.

This is why we chose a hybrid approach — graph traversal for structural queries, vector similarity for semantic search, combined in a single pipeline.

Why LanceDB + lance-graph

Every graph memory system in the landscape above requires a separate server — Neo4j for Graphiti, various backends for Mem0, cloud services for Microsoft GraphRAG. That's a non-starter for a single-binary runtime.

We needed a graph database that:

  • Embeds in-process (no separate server — walrus is a single binary)
  • Is Rust-native (walrus is written in Rust)
  • Supports both vector search and graph traversal
  • Provides versioning at the storage layer

LanceDB + lance-graph checks every box.

LanceDB Ecosystem (March 2026)

LanceDB: battle-tested at scale

LanceDB isn't a toy. It's backed by $41M in funding (Series A led by Theory Ventures, June 2025) and used in production by Midjourney, Netflix, Uber, ByteDance, and Character.AI — handling billion-scale vector search.

MetricValue
GitHub stars15.2K combined (lancedb + lance format)
PyPI downloads~2.6M/month
Monthly contributors40+, from Uber, Netflix, Hugging Face, ByteDance
Production usersMidjourney, Netflix, Uber, Character.AI, Harvey
Vector search latency25ms typical, 3ms at 0.9 recall
File read throughput6-9x faster than Parquet
Storage IOPS1.5M peak on NVMe

The Lance columnar format (SDK 1.0 since Dec 2025) provides the foundation: ACID transactions, zero-copy schema evolution, and immutable versioning at the storage layer. You can query "what did the agent know yesterday?" without journal files — Lance versioning handles it natively.

LanceDB is the default vector database for AnythingLLM, powers local semantic search in Continue.dev (40% improvement in auto-completion relevance), and is the default vector store for Microsoft GraphRAG.

lance-graph: graph queries over columnar data

lance-graph adds a Cypher query engine on top — graph nodes and edges stored as Lance tables, with hybrid GraphRAG queries built in.

CapabilityDetails
Query languageCypher (read-only subset with MATCH, WHERE, WITH, aggregation)
Vector searchNative ANN, L2/Cosine/Dot metrics
GraphRAGexecute_with_vector_rerank() — graph filter then vector rank
Basic filter latency~680µs (100 items) to ~743µs (1M items)
Single-hop expand~3.70ms (1M nodes)
Two-hop expand~6.16ms (1M nodes)
LanguageRust 83.7%, Python bindings via PyO3

The sub-linear latency growth — 680µs to 743µs for a 10,000x data increase on basic filters — reflects DataFusion's columnar batch processing and predicate pushdown.

Honest assessment of maturity

lance-graph is young. v0.5.3, ~128 GitHub stars, still an incubating subproject under the Lance governance model. APIs may change without notice. No confirmed production deployments of lance-graph specifically (though LanceDB itself is heavily battle-tested).

The query engine is read-only — no CREATE, DELETE, or MERGE in Cypher. Writes go through the LanceGraphStore Python/Rust API. For an agent runtime that handles writes through its own tools (not Cypher), this constraint doesn't matter. For other use cases, it might.

We're betting on a young project with a strong parent ecosystem. The risk is real, but the alternative — requiring users to run a Neo4j server alongside walrus — contradicts our single-binary philosophy.

Comparison with alternatives

LanceDB + lance-graphSQLite + sqlite-vecNeo4j + Graphiti
DeploymentEmbedded, in-processEmbedded, in-processSeparate server
Graph queriesCypher (read-only)Manual SQL joinsFull Cypher
Vector searchNative ANNsqlite-vec extensionPlugin
Temporal trackingLance versioningManualBuilt-in
Rust nativeYesVia bindingsNo
Fits single binaryYesYesNo
MaturityEarly (v0.5.3, $41M ecosystem)Very matureVery mature (13K+ stars)

SQLite + sqlite-vec is the pragmatic alternative — mature, embedded, battle-tested. But graph queries require manual SQL joins, and there's no native hybrid GraphRAG pattern. For traversing relationships and searching semantically in a single query, the graph-native approach wins.

Neo4j with Graphiti is the most capable graph memory system available — 94.8% DMR accuracy, full Cypher, production-proven. But it requires a separate server — a non-starter for a single-binary runtime.

Everything is the graph

Agent identity, user preferences, conversation history, extracted facts, relationships between entities — all stored in two tables in one LanceDB database: entities and relations. Compacted conversation summaries live in a third table: journals.

The framework ships seven built-in entity types:

  • identity — the agent's identity and personality. Queried at session start and injected into the system prompt. This is what SOUL.md used to be.
  • profile — the user's profile. Also injected at session start. This replaces User.toml.
  • preference — user preferences. Linked to profile via relations.
  • fact — general facts the agent has learned.
  • person, event, concept — structured entity types for people, events, and abstract concepts the agent encounters.

Entity types are configurable — you can add domain-specific types in walrus.toml without touching framework code.

Six tools, no queries

The agent interacts with memory through six tools. It never writes Cypher or SQL — the framework handles storage and retrieval internally.

ToolWhat it does
rememberStore a typed entity (type, key, value). Upserts with FTS indexing.
recallFull-text search across entities, optionally filtered by type. Returns top-K matches.
relateCreate a directed edge between two existing entities.
connectionsTraverse the graph from a given entity — 1-hop via lance-graph Cypher, optionally filtered by relation or direction.
compactTrigger compaction: summarize the conversation, embed the summary, store as a journal entry.
distillSemantic search over past journal summaries. Find relevant context from previous sessions.

Why not expose raw Cypher? Because Text2Cypher is unreliable even with frontier LLMs — they hallucinate syntax and miss schema constraints. The framework knows the schema because it created it.

Two retrieval paths

The six tools reflect two distinct retrieval mechanisms:

Graph + FTS path (recall, connections): Entity nodes are stored with full-text search indexing. recall runs FTS on key and value fields, agent-scoped. connections runs a 1-hop Cypher traversal via lance-graph. These are fast (sub-millisecond) and exact.

Vector semantic path (distill): Journal entries — compaction summaries — are embedded with all-MiniLM-L6-v2 (384 dimensions via fastembed). distill runs ANN search over these embeddings to find semantically similar past context.

This means different queries hit different stores:

At session start, walrus injects identity entities, profile entities, and the three most recent journal summaries into the system prompt — giving the agent its identity and continuity context before the first turn.

Temporal tracking

Every entity carries created_at and updated_at. Relations carry created_at. Journal entries carry created_at and their 384-dim embedding. This is enough to answer "what did the agent learn this session?" or "when was this fact recorded?" — without separate journal files.

Full bi-temporal tracking (valid_from, valid_until) — the Zep approach that achieves +38.4% on temporal reasoning — is on the roadmap. The current implementation tracks creation time; expiration is a future addition.

Extending the schema

The entity and relation types are open and configurable. The defaults give you a working memory system. Your walrus.toml extends it without touching framework code:

[memory]
entities = ["ticket", "sprint", "arch_decision"]
relations = ["blocks", "supersedes", "owned_by"]

These merge with the built-in types — the framework never loses the default fact, preference, identity, and profile types. Extensions only add.

This follows the same pattern as everything else in walrus (see less code, more skills): the framework ships a working default, skills and configuration extend it, MCP servers add entity types at runtime.

Compaction as memory formation

When the context window fills up, the agent calls compact. The runtime summarizes the full conversation history with the LLM, embeds the summary with all-MiniLM-L6-v2, stores it as a journal entry, and replaces the conversation history with just that summary. The agent continues with a clean context window and the summary as continuity.

This turns compaction from a lossy operation into a memory-formation event. Past journal entries are searchable via distill in future sessions — semantic search over the embedding finds relevant past context even when the agent doesn't remember the exact session it came from.

Mem0's research shows that smart compaction improves reasoning: 91% lower P95 latency and 90%+ token reduction compared to full-context approaches, while achieving a 26% relative improvement in LLM-as-a-Judge scores.

What you lose, what you gain

This design has a real tradeoff: you can't cat SOUL.md anymore.

What you lose:

  • Human-readable files you can open in a text editor
  • Git-diffable memory changes
  • The simplicity of echo "prefer tabs" >> CLAUDE.md

What you gain:

  • Queryable structure — recall and connections over a typed entity graph
  • Temporal tracking — created_at on everything, full bi-temporal expiration on the roadmap
  • Relationship-aware traversal — connections follows edges from decisions to reasons to related entities
  • Semantic history — distill finds relevant past journal summaries across sessions
  • Versioning via Lance — the storage layer tracks history at the columnar level

Every product in our memory survey that gained developer trust stores memory in human-readable formats. We're breaking that pattern. The bet is that queryable, structured memory is worth more than cat-ability — and that the recall, connections, and distill tools give the agent (and eventually the user) direct inspection paths.

References

Get started with OpenWalrus →