Sandboxing AI agents: beyond Docker and WASM

How Claude Code, Cursor, Codex, and Devin gate bash access — and why Docker, WASM, and OS sandboxes are all solving the wrong layer.

An AI coding agent that can't run shell commands, edit files, or install packages is useless. An AI coding agent that can do all of those things without restriction is dangerous. Every production agent system sits somewhere on this spectrum — and most of them are struggling with it.

In 2025 alone, three runc CVEs (CVE-2025-31133, CVE-2025-52565, CVE-2025-52881) demonstrated that standard Docker containers share the host kernel and can be escaped. A filesystem MCP server had symlink-based CVEs that allowed full system takeover from inside a sandboxed setup. A Cursor sandbox vulnerability leaked secrets from the host environment. The problem isn't theoretical.

We surveyed how seven major agent systems handle sandboxing and permissions, mapped the pain points, and looked at what's emerging beyond Docker — including why WASM with host functions doesn't solve what you think it solves.

How production agents handle permissions

Claude Code: OS-level sandbox, no containers

Claude Code uses the operating system's own isolation primitives. On macOS: Apple's Seatbelt (sandbox-exec) with a runtime-generated policy. On Linux: bubblewrap (bwrap) for namespace isolation. No Docker daemon, no containers, no VMs.

The sandbox restricts the agent to the working directory for writes. All network traffic routes through a proxy Unix domain socket — outbound is blocklisted by default, with an allowlist for approved domains.

On top of the sandbox, Claude Code has five permission modes: default (ask before everything risky), acceptEdits (auto-approve file edits, prompt for shell), plan (read-only), dontAsk and bypassPermissions (for fully isolated CI environments). A PreToolUse hook system lets users write shell scripts that approve or deny tool calls programmatically — automating the permission decision without disabling the safety layer.

Cursor: Landlock + seccomp, workspace-scoped

Cursor's sandbox uses the same primitives as Claude Code — Seatbelt on macOS, Landlock LSM + seccomp on Linux — but with a workspace-scoped policy generated at runtime. The agent can read/write the open workspace and /tmp, read the broader filesystem, but cannot write outside the workspace or make network requests without explicit user approval.

The results are concrete: Cursor reports that sandboxed agents stop 40% less often than unsandboxed ones. Fewer false positives, fewer interruptions, more autonomy. As of early 2026, one third of all Cursor requests run sandboxed.

Enterprise admins get granular network allowlists/denylists. Regular users don't — network approval is per-request.

OpenAI Codex CLI: Landlock + seccomp in Rust

Codex CLI follows the same pattern — Seatbelt on macOS, Landlock + seccomp on Linux — but implements the Linux sandbox in Rust using the seccompiler crate (the same BPF compiler used by AWS Firecracker). A codex-linux-sandbox helper process enforces restrictions. No Docker, no VM.

Network is blocked by default in sandboxed modes. Filesystem writes are restricted to configured writable roots. Debug commands (codex debug seatbelt, codex debug landlock) help diagnose policy issues.

Devin: Docker with managed cloud

Devin runs each session in a Docker container hosting a full environment: terminal, browser, code editor, and planner. The container runs on AWS (multi-tenant SaaS) or in the customer's VPC via AWS PrivateLink. SOC 2 Type II certified.

The advantage: the agent gets a complete, isolated machine. The tradeoff: Docker's shared-kernel isolation model, cloud latency, and the inability to run locally. Devin is a managed service, not a local tool.

OpenHands: Docker per session, docker.sock risk

OpenHands (formerly OpenDevin) runs each task in a Docker container, torn down post-session. The agent accesses the container via SSH. Workspace files are mounted via Docker volumes.

The documented security caveat: OpenHands requires mounting /var/run/docker.sock, which gives the container full control over the host Docker daemon. If an agent escapes the container, it has access to all Docker resources on the host. This is acknowledged in their own materials and remains an active research problem.

Aider: no sandbox

Aider runs entirely in your terminal with direct filesystem access. No container, no VM, no OS-level isolation.

The safety model is git: all changes go through git, so git diff and git checkout are the undo mechanism. This is an explicit design choice — simplicity over isolation. The community workaround for stronger isolation is to run Aider inside a devcontainer.

GitHub Copilot Coding Agent: Actions + firewall

GitHub's Copilot Coding Agent runs in a GitHub Actions environment with internet access controlled by a firewall. The agent has read-only repo access for exploration and can only push to branches prefixed with copilot/. Standard branch protection rules and required checks apply.

No arbitrary terminal access to the user's local machine. The agent lives in GitHub's infrastructure, not yours.

The permission model spectrum

Sandbox Capabilities Across AI Agent Systems

The radar shows a clear split. Claude Code, Cursor, and Codex CLI converge on the same architecture: OS-level primitives (Seatbelt/Landlock/seccomp) with no Docker dependency. They score high on filesystem isolation and network control but share a cross-platform weakness — maintaining three separate sandbox implementations for macOS, Linux, and Windows.

Devin and OpenHands use Docker, which gives strong shell restrictions (everything runs inside the container) but weaker network control and no cross-platform story beyond "install Docker." Aider skips sandboxing entirely, trading security for zero setup friction.

The real gate: in-system permissions for bash

All the sandbox technologies above — Docker, Firecracker, Landlock, Seatbelt — answer the same question: if the agent runs a dangerous command, how do we limit the blast radius? But there's a prior question that matters more in practice: how does the agent get permission to run bash in the first place?

Every agent system is fundamentally a loop that decides whether to grant an LLM access to a shell. The sandbox is the safety net. The permission system is the gate. In practice, the gate does more work than the net.

Permission model comparison

System	Default bash policy	Approval mechanism	Escape hatch	Granularity
Claude Code	Ask before every shell command	User prompt per-command; `PreToolUse` hooks for automation	`--dangerously-skip-permissions`	Per-command pattern matching (`Bash(git commit:*)`)
Cursor	Deny outside workspace	Auto-approve safe tools (grep, ls); prompt for state-changing commands	Enterprise admin overrides	Per-tool category (safe / state-changing)
Codex CLI	Blocked in sandbox modes	`--full-auto` mode for trusted environments	`--full-auto` flag	Per-mode (interactive / sandbox / full-auto)
Windsurf	Ask before terminal commands	User approval per-command; `.codeiumignore` for file exclusions	Security rules with `NEVER` flags	Per-command with rule overrides
Devin	Full access inside container	Container is the boundary — no per-command approval	N/A (container is the sandbox)	All-or-nothing (container-level)
OpenHands	Full access inside container	Event stream API gates actions inside container	N/A	Per-action type via event API
Aider	Full access, no restriction	None — git is the undo mechanism	N/A	None
GitHub Copilot	No local shell access	Runs in Actions; can only push to `copilot/` branches	N/A	Branch-scoped

Two philosophies

The table reveals two fundamentally different approaches:

Gate the command, sandbox the process. Claude Code, Cursor, Codex CLI, and Windsurf all run on the user's machine and mediate individual bash commands. The agent proposes a command; the system (or user) approves or denies it. The OS-level sandbox (Landlock/Seatbelt) is the fallback if an approved command does something unexpected. This is high-friction but fine-grained — you can allow git commit while blocking rm -rf /.

Gate the environment, give full access inside. Devin, OpenHands, and GitHub Copilot put the agent in an isolated environment and let it run anything. No per-command approval, no permission prompts. The boundary is the container or VM wall. This is low-friction but coarse — you can't allow some commands while blocking others, because the agent has root inside its sandbox.

The first approach treats the sandbox as defense-in-depth. The second treats the sandbox as the only defense. When the container is the only boundary, a container escape is game over. When per-command approval exists alongside OS-level restrictions, both have to fail simultaneously.

The browser problem

There's a third dimension that neither philosophy handles well: browser access with authenticated sessions.

For autonomous agents doing real-world tasks — filling forms, navigating internal tools, interacting with SaaS dashboards — the browser with the user's logged-in profiles is arguably the most valuable resource on the machine. Cookies, saved passwords, OAuth tokens, browser extensions, session state across dozens of services — all of it lives in the user's browser profile.

Docker and VM-based sandboxes cut the agent off from this entirely. The agent gets a fresh browser with no sessions, no cookies, no saved credentials. Every service requires re-authentication. Extensions don't exist. The agent is starting from zero in the environment where the user has years of accumulated state.

Anthropic's Computer Use runs agents against a visible desktop — with access to whatever the user has open, including browsers. Daytona's Computer Use sandbox takes a similar approach for Linux/macOS/Windows desktop automation. But these operate outside the sandboxing models discussed above. The agent either gets the user's full desktop (powerful but dangerous) or a fresh container desktop (safe but useless for authenticated workflows).

This is the fundamental tension for autonomous agents. The resources that make agents most useful — authenticated browser sessions, logged-in APIs, the user's actual environment — are exactly the resources that sandboxing is designed to restrict. No current system resolves this cleanly. The closest approximation is scoped OAuth tokens and per-service API keys granted to the agent explicitly, but that requires every service to support programmatic access, which most internal tools don't.

What works best

Cursor's data gives the clearest signal: agents with proper environmental sandboxing stop 40% less often than unsandboxed ones. The intuition is counterintuitive — more isolation produces more autonomy, not less.

The reason: without a sandbox, the system must prompt the user for every potentially dangerous action because there's no safety net. With a sandbox, the system can auto-approve most actions because the blast radius is contained. The sandbox replaces per-command friction with environmental safety.

Claude Code's hook system points at a middle path. PreToolUse hooks let users write shell scripts that approve or deny tool calls based on pattern matching — Bash(npm install:*) auto-approves, Bash(rm -rf:*) always denies. This moves permission decisions from interactive prompts to declarative policy. The agent doesn't stop to ask; the policy pre-answers.

The pattern that emerges across all systems: the best permission model is the one where the user configures policy once and then forgets about it. Per-command prompts don't scale. Container-level all-or-nothing doesn't give enough control. Declarative per-capability policies — what commands, what directories, what network endpoints — hit the sweet spot. No system has fully nailed this yet.

Pain points

The restrictive/permissive policy trap

ARMO's research documents a pattern seen across organizations: security teams set overly restrictive sandbox policies. The agent breaks within 48 hours. Teams loosen policies incrementally until they become permissive enough to be meaningless. The result is security theater — a sandbox that exists on paper but blocks nothing in practice.

The fix requires progressive enforcement: start permissive with monitoring, tighten based on observed behavior. But most sandbox systems offer all-or-nothing controls, not gradual ones.

Docker cold starts and resource overhead

Docker containers incur 500ms–2s cold starts depending on base image. For coding agents, the situation is worse — cloning a repository and installing packages can add 20–30 seconds before any code executes. At high concurrency (~400 parallel starts), CNI plugin and virtual switch setup inflate boot times by up to 263%.

Teams maintain "warm pools" of pre-booted containers to mitigate this, creating an economic trap: idle containers waste compute, but building a predictive autoscaler is a serious engineering challenge.

Docker socket escape

OpenHands requires /var/run/docker.sock access. This is well-documented and widely discussed. The socket gives any process with access full control over the host Docker daemon — start containers, mount host filesystems, access networks. Container escape equals full host access.

Cross-platform inconsistency

Claude Code, Cursor, and Codex CLI each maintain separate sandbox implementations for macOS (Seatbelt) and Linux (Landlock/bubblewrap/seccomp). Windows gets WSL2 at best. Every sandbox has different capabilities, different edge cases, and different failure modes. Apple marks sandbox-exec as deprecated — the macOS implementation is load-bearing but architecturally uncertain.

Symlink escapes and path traversal

CVE-2025-53109 and CVE-2025-53110 in a filesystem MCP server showed that path prefix matching — the most common way sandboxes restrict filesystem access — can be bypassed through symlinks. An agent creates a symlink inside the allowed directory pointing outside it, and the sandbox's path check passes while the actual access goes elsewhere. This is a class of vulnerability, not a single bug.

Constant permission prompts vs autonomy

Agents that sandbox at the process level but need host access for installing packages, running tests, or accessing databases face a dilemma: prompt the user for every action (destroying flow) or run without restriction (destroying safety). Cursor's data — 40% fewer stops with proper sandboxing — shows that environmental sandboxing (the agent runs inside the sandbox, so package installs and test runs don't need approval) is better than per-action approval.

Beyond Docker: the sandbox landscape

Firecracker microVMs

Firecracker — the technology behind AWS Lambda — gives each sandbox its own kernel. Boot time is under 200ms. Memory overhead per microVM is under 5 MiB. This eliminates the shared-kernel escape vectors that plague runc containers.

E2B builds on Firecracker to offer ephemeral sandboxes purpose-built for AI agents. Pre-warmed snapshot pools bring startup under 200ms. Manus uses E2B for its agent execution. Microsandbox is an open-source alternative using libkrun (KVM-based microVMs) with sub-200ms boot, Apache-2.0 license, and MCP integration.

The tradeoff: Firecracker requires KVM (Linux only), and networking setup at scale (TAP interfaces, IP tables, CNI plugins) becomes the primary bottleneck at ~400 parallel starts.

OS-level primitives: what Claude Code and Codex actually use

The most significant finding in this survey: the two most widely-used local coding agents — Claude Code and Codex CLI — don't use Docker or VMs at all. They use kernel-level primitives directly.

Landlock (Linux 5.13+) lets a process restrict its own filesystem and network access at the kernel level without requiring root. Unlike seccomp (which filters syscalls) or namespaces (which need privileges), Landlock lets a process sandbox itself from within. It filters at the point of operation in the kernel, not at the syscall interface.

seccomp-bpf attaches a BPF program to a process that filters every syscall. It's the most surgical tool: you can allow or deny specific syscalls with specific arguments. The limitation: syscall lists are architecture-specific.

bubblewrap combines Linux namespaces (PID, mount, network, user) with an unprivileged setup — the same tool Flatpak uses for desktop app sandboxing.

On macOS, Seatbelt (sandbox-exec) provides kernel-enforced sandboxing via a Scheme-like profile language. Several community tools have been built on this pattern for AI agent sandboxing: agent-seatbelt-sandbox, scode.

The advantage: near-zero overhead, no daemon process, no root required (on Linux). The disadvantage: not a full machine — the agent shares the host kernel and can't install system packages without host access.

gVisor: user-space kernel

gVisor interposes on syscalls in user space, implementing a subset of the Linux kernel in Go. It sits between containers and the host kernel, providing stronger isolation than runc without requiring a full VM.

Google's Agent Sandbox — announced at KubeCon NA 2025 as a CNCF project — uses gVisor as its foundation, with optional Kata Containers for workloads needing full microVM isolation. Pre-booted warm pools and a declarative CRD API target sub-second cold starts.

The tradeoff: 20–50% overhead on syscall-heavy workloads. For AI agents running compilers and test suites, that's significant.

WebContainers: browser-native

StackBlitz WebContainers run a WASM-based operating system entirely in the browser — no remote server provisioned. The browser's own security sandbox handles isolation. Bolt.new uses this to give AI agents full control over a Node.js environment including filesystem, package manager, and terminal, entirely client-side.

The hard limitation: Node.js and browser-compatible runtimes only. You can't run Python, Rust, Go, or any native binary. For web development agents, it's excellent. For general-purpose coding agents, it's a non-starter.

Emerging platforms

Daytona pivoted from dev environments to AI agent infrastructure in early 2025. Claims sub-27ms sandbox spin-up, raised $24M Series A in early 2026. Supports process execution, filesystem, native Git, and Computer Use sandboxes for desktop automation.

Modal offers a serverless container fabric tested up to 1,000 sandbox creations per second. Used by Lovable and Quora's Poe for AI code execution.

Morph Cloud focuses on instant environment branching — snapshot-and-restore workflows where agents branch from a known state, execute, then discard or persist.

Why WASM doesn't solve it

WASM is the obvious candidate for sandboxing: memory-safe execution with microsecond overhead, capability-based I/O, and strong module isolation. Wasmtime enforces that modules cannot access system resources without explicit capability grants. In theory, perfect.

In practice, the boundary breaks as soon as you need it.

WASI capabilities punch holes in the sandbox. The moment you expose filesystem or network capabilities to a WASM module — which you must, for any agent that needs to read files, install packages, or make API calls — the module has access to those resources. The sandbox is only as strong as the capability grants, and useful agents need broad capabilities.

Resource exhaustion bypasses the capability model entirely. A 2025 security analysis showed that exposed WASI/WASIX interfaces allow malicious modules to starve shared OS resources — CPU cycles, disk I/O, bandwidth, entropy pools, kernel objects. Even with cgroups and quotas, syscall floods can degrade system performance by over 94%. WASM isolates memory. It doesn't isolate compute.

Arbitrary tools can't run in WASM. An agent that runs cargo test, pytest, gcc, or docker compose needs native binaries. Compiling every tool and its dependencies to WASM is a massive operational burden — and many tools (anything using raw syscalls, threads, or mmap) don't compile at all. Pydantic's Monty is the most credible attempt at a WASM-based Python sandbox, and it explicitly covers only a subset of Python.

The I/O complexity problem. Agents need external data — files, databases, APIs — to act. WASM strictly limits I/O. The result is complex JavaScript glue code or custom host functions that effectively bypass the security model you built WASM to enforce.

Microsoft's Wassette (August 2025) is the most principled attempt — a Rust-based runtime executing WASM Components via MCP with deny-by-default capabilities. But it's limited to WASM Components, not arbitrary code execution.

The bottom line: WASM is excellent for plugin sandboxing — running untrusted extensions in a controlled environment with minimal capabilities. It is not a solution for general agent execution where the agent needs to run arbitrary tools, install packages, and interact with the host system.

The isolation hierarchy

Isolation Technologies: Boot Time vs Isolation Strength

Technology	Boot time	Memory overhead	Isolation	Root needed	Best for
Landlock/Seatbelt	~1ms	near-zero	OS-level (shared kernel)	No	Local coding agents
WASM (Wasmtime)	microseconds	~1 MiB	Language VM	No	Plugin sandboxing
Docker/runc	500ms–2s	10–50 MiB	Shared kernel + namespaces	Yes (daemon)	Legacy, dev environments
gVisor (runsc)	~container	moderate	User-space kernel	Yes	Kubernetes workloads
Firecracker	~125ms	under 5 MiB	Dedicated kernel	Yes (KVM)	Untrusted agent code at scale
Full VM (KVM/QEMU)	1–30s	hundreds of MiB	Strongest	Yes	Maximum isolation

The industry direction: for executing AI-generated code with untrusted inputs at scale, the minimum viable isolation is a microVM (Firecracker, libkrun, Kata Containers). For local developer tooling, OS-level primitives (Landlock + seccomp + bubblewrap/Seatbelt) are practical and require no daemon.

Docker sits in an awkward middle ground — more overhead than OS-level primitives, less isolation than microVMs. Three runc CVEs in 2025 alone have made it clear that shared-kernel container isolation is not sufficient for untrusted code.

Open questions

This survey raises more questions than it answers. A few that stood out:

Can permission models be progressive rather than binary? The restrictive/permissive policy trap suggests that all-or-nothing controls don't work in practice. A permission system that starts restrictive and widens based on observed behavior — earning trust over a session — doesn't exist yet. What would it take to build one?

Is there a cross-platform sandbox primitive waiting to be built? Every system maintaining three implementations (macOS/Linux/Windows) is doing redundant work. Apple marking sandbox-exec as deprecated adds urgency. A portable capability-based sandbox — something like pledge/unveil but cross-platform — would change the economics for every agent project. The Rust ecosystem has pieces (extrasafe, seccompiler, cap-std) but no unified abstraction.

How do you sandbox an agent that needs your browser? The user's authenticated browser — cookies, extensions, OAuth tokens, session state across dozens of services — is the single most valuable resource for autonomous agents doing real-world tasks. Every sandboxing approach either gives the agent full desktop access (dangerous) or a fresh environment with no sessions (useless). Scoped OAuth tokens are a partial answer, but most internal tools don't support programmatic access. This might be the hardest unsolved problem in agent sandboxing.

What's the right granularity for agent permissions? Per-command approval (Claude Code's default mode) is safe but slow. Per-session blanket access (Aider) is fast but unsafe. Per-capability declarations (the pledge model) sit in between but require the agent to know its needs upfront. The approval gates pattern from planning research suggests permission decisions should happen at plan time, not execution time — but no system implements this yet.