Less code, more skills

How OpenWalrus stays compact while scaling to any use case — by pushing intelligence to skills, not framework code.

OpenWalrus is a single binary. No Docker, no microservices, no plugin runtime with a package manager. One cargo install, one process, and you have a fully autonomous AI agent runtime on your machine.

Keeping it that way while scaling to every possible use case is the central design tension of the project. And it's the same tension every agent framework faces: how do you stay small without becoming limited?

Our answer is a design principle we keep coming back to: less code, more skills.

The framework bloat trap

Agent frameworks grow fast. A team ships a coding agent. Users ask for web browsing, so they add a browser tool. Users ask for memory, so they add a memory subsystem. Users ask for RAG, so they bundle an embedding model. Users ask for customization, so they add configuration layers — CLAUDE.md, .cursorrules, AGENTS.md, TOOLS.md, MEMORY.md, memory banks, auto-generated observations, reflections, compressed histories.

Every feature request answered with framework code makes the repo bigger, the binary heavier, the surface area wider, and the maintenance burden steeper. Eventually the framework is doing so much that it becomes the bottleneck — slow to build, hard to debug, impossible to audit.

The system prompt suffers the same inflation. Research shows frontier LLMs reliably follow around 150-200 instructions. Past that, adherence degrades — sometimes exponentially for smaller models. Every feature that injects more context into the prompt makes the agent worse at everything else.

We've watched this happen. We hit the ceiling ourselves. And we stopped pushing through it.

The principle: small core, open surface

The walrus repo should stay compact. Not because we're lazy, but because a compact core is a correct core — easier to audit, easier to trust, easier to run on constrained hardware.

But a compact core only works if the surface area for extension is wide open. This is where skills come in.

The core handles what only the core can handle: LLM inference, agent lifecycle, tool dispatch, and a graph memory layer backed by LanceDB + lance-graph. Both are embedded, Rust-native, and compile into the walrus binary — no separate database server, no Docker. This is the code we maintain. It should be small, correct, and boring.

Skills and MCP servers handle everything else. A skill is a behavioral template — instructions and patterns that tell an agent how to approach a domain, including which entity types and relationships to extract from conversations. MCP servers can register new entity types at runtime. The community writes them. Users mix and match them. The repo doesn't grow.

This is the Unix philosophy applied to agent runtimes. Small tools that compose, not monolithic systems that configure.

Three layers of extension

The "small core, open surface" idea plays out in a consistent three-layer model across every walrus subsystem — tools, memory, and entity types all follow the same pattern.

Layer 1 — Framework built-ins. The things only the core can provide. A filesystem tool, a shell tool, an HTTP client, four memory tools (remember, recall, relate, forget), and three base entity types (Agent, User, Episode). This is the floor — always available, always correct.

Layer 2 — Skills. Behavioral templates that tell the agent how to approach a domain. A coding skill declares entity types like File, TestFailure, ArchDecision and teaches the agent how to extract them. A research skill declares Paper, Topic, Citation. A DevOps skill teaches the agent to compose kubectl and terraform commands. Skills are a few hundred lines of behavioral description, not compiled code.

Layer 3 — MCP servers. External capabilities connected at runtime. A Jira MCP registers Ticket, Sprint, Epic as first-class entities. A GitHub MCP adds PR, Issue, Commit. The agent's capability surface grows without any framework changes.

Every subsystem follows this pattern. Memory isn't special. Tools aren't special. Entity types aren't special. The extension model is the same everywhere — which means learning it once is enough.

Memory: the first test of the principle

Memory was where we first applied "less code, more skills" — and where the principle proved itself.

Our survey of existing memory systems showed every product building a comprehensive memory subsystem. Claude Code with markdown files and auto-memory. OpenClaw with SQLite + vectors and hybrid search. ChatGPT with a proprietary backend. Each is a bet on one particular memory layout being right for most users.

Instead of building a universal memory framework with config files and journal directories, we collapsed everything into a single layer: a temporal knowledge graph backed by LanceDB + lance-graph. Agent identity, user preferences, conversation episodes, extracted entities — all graph nodes. Four tools to interact with it. Skills define what to extract; the core handles how.

The memory schema grows with the agent's capability surface. Install a coding skill and File, TestFailure become extractable entity types. Connect a Jira MCP and Ticket, Sprint appear. No framework changes. No config files. The three-layer extension model does the work.

Read the full deep-dive in Graph + vector: how OpenWalrus agents remember.

Beyond memory

"Less code, more skills" isn't just a memory strategy. It's how we think about every feature request.

When someone asks "can walrus browse the web?" — the answer isn't a built-in browser engine. It's an HTTP tool and a web browsing skill that knows how to navigate, extract, and summarize.

When someone asks "can walrus manage my infrastructure?" — the answer isn't a built-in cloud SDK. It's a shell tool and a DevOps skill that knows how to compose kubectl, terraform, and aws commands.

When someone asks "can walrus do X?" — the answer is almost always: the tools already exist, we just need a skill.

This keeps the repo compact. Every skill is a few hundred lines of behavioral description, not thousands of lines of compiled code. The core stays auditable. The binary stays small. And the ecosystem of what walrus can do grows without bound — because the community builds it, not us.

The tradeoff

This isn't free. Pushing intelligence to skills means:

The core tools have to be excellent. If the built-in tools are unreliable, no skill can compensate. This is where our engineering effort goes — making the foundational layer rock-solid.
Quality varies. Community skills won't all be good. Some will be brilliant, most will be adequate, a few will be wrong. Curation and testing matter.
Discovery is harder. Users need to find the right skill for their use case. This is a community infrastructure problem we haven't fully solved yet.
Skills need good documentation. A skill is only as useful as its instructions are clear. Bad behavioral descriptions produce bad agent behavior — garbage in, garbage out.

But the alternative — baking every capability into the framework — is worse. It makes the repo unmaintainable, the binary bloated, and the system prompt overloaded. We'd rather have a small, correct core and a messy ecosystem than a bloated, fragile framework and no ecosystem at all.

Stop injecting, start enabling

The system prompt was never meant to be a database. It was meant to be a brief set of instructions — who you are, how you behave, what tools you have. The moment we started using it as a persistence layer, we created a problem that no amount of engineering can solve.

The fix isn't more framework code. It's better tools and shareable skills.

Keep the core compact. Keep the surface open. Let agents and communities build the intelligence. Less code, more skills.

Get started with OpenWalrus →