Sandboxing AI agents: beyond Docker and WASM
How Claude Code, Cursor, Codex, and Devin gate bash access — and why Docker, WASM, and OS sandboxes are all solving the wrong layer.
An AI coding agent that can't run shell commands, edit files, or install packages is useless. An AI coding agent that can do all of those things without restriction is dangerous. Every production agent system sits somewhere on this spectrum — and most of them are struggling with it.
In 2025 alone, three runc CVEs demonstrated that standard Docker containers share the host kernel and can be escaped. A filesystem MCP server had symlink-based CVEs that allowed full system takeover from inside a sandboxed setup. A Cursor sandbox vulnerability leaked secrets from the host environment. The problem isn't theoretical.
We surveyed how seven major agent systems handle sandboxing and permissions, mapped the pain points, and looked at what's emerging beyond Docker — including why WASM with host functions doesn't solve what you think it solves.
How production agents handle permissions
Claude Code: OS-level sandbox, no containers
Claude Code uses the operating system's own isolation primitives.
On macOS: Apple's Seatbelt (sandbox-exec) with a runtime-generated
policy. On Linux: bubblewrap
(bwrap) for namespace isolation. No Docker daemon, no containers, no VMs.
The sandbox restricts the agent to the working directory for writes. All network traffic routes through a proxy Unix domain socket — outbound is blocklisted by default, with an allowlist for approved domains.
On top of the sandbox, Claude Code has five
permission modes:
default (ask before everything risky), acceptEdits (auto-approve
file edits, prompt for shell), plan (read-only), dontAsk and
bypassPermissions (for fully isolated CI environments). A
PreToolUse hook system lets users write shell scripts that
approve or deny tool calls programmatically — automating the
permission decision without disabling the safety layer.
Cursor: Landlock + seccomp, workspace-scoped
Cursor's sandbox uses the same primitives as Claude Code — Seatbelt on macOS,
Landlock LSM + seccomp on
Linux — but with a workspace-scoped policy generated at runtime. The agent
can read/write the open workspace and /tmp, read the broader filesystem,
but cannot write outside the workspace or make network requests without
explicit user approval.
The results are concrete: Cursor reports that sandboxed agents stop 40% less often than unsandboxed ones. Fewer false positives, fewer interruptions, more autonomy. As of early 2026, one third of all Cursor requests run sandboxed.
Enterprise admins get granular network allowlists/denylists. Regular users don't — network approval is per-request.
OpenAI Codex CLI: Landlock + seccomp in Rust
Codex CLI follows the same pattern — Seatbelt on macOS, Landlock + seccomp on
Linux — but implements the Linux sandbox in Rust using the
seccompiler crate (the same BPF
compiler used by AWS Firecracker). A codex-linux-sandbox helper process
enforces restrictions. No Docker, no VM.
Network is blocked by default in sandboxed modes. Filesystem writes are
restricted to configured writable roots. Debug commands (codex debug seatbelt,
codex debug landlock) help diagnose policy issues.
Devin: Docker with managed cloud
Devin runs each session in a Docker container hosting a full environment: terminal, browser, code editor, and planner. The container runs on AWS (multi-tenant SaaS) or in the customer's VPC via AWS PrivateLink. SOC 2 Type II certified.
The advantage: the agent gets a complete, isolated machine. The tradeoff: Docker's shared-kernel isolation model, cloud latency, and the inability to run locally. Devin is a managed service, not a local tool.
OpenHands: Docker per session, docker.sock risk
OpenHands (formerly OpenDevin) runs each task in a Docker container, torn down post-session. The agent accesses the container via SSH. Workspace files are mounted via Docker volumes.
The documented security caveat: OpenHands requires mounting
/var/run/docker.sock, which gives the container full control over the
host Docker daemon. If an agent escapes the container, it has access to
all Docker resources on the host. This is
acknowledged in their own materials
and remains an active research problem.
Aider: no sandbox
Aider runs entirely in your terminal with direct filesystem access. No container, no VM, no OS-level isolation.
The safety model is git: all changes go through git, so git diff and
git checkout are the undo mechanism. This is an explicit design choice —
simplicity over isolation. The community workaround for stronger isolation
is to run Aider inside a devcontainer.
GitHub Copilot Coding Agent: Actions + firewall
GitHub's Copilot Coding Agent runs in a GitHub Actions environment with
internet access controlled by a firewall. The agent has read-only repo
access for exploration and can only push to branches prefixed with copilot/.
Standard branch protection rules and required checks apply.
No arbitrary terminal access to the user's local machine. The agent lives in GitHub's infrastructure, not yours.
The permission model spectrum
Sandbox Capabilities Across AI Agent Systems
The radar shows a clear split. Claude Code, Cursor, and Codex CLI converge on the same architecture: OS-level primitives (Seatbelt/Landlock/seccomp) with no Docker dependency. They score high on filesystem isolation and network control but share a cross-platform weakness — maintaining three separate sandbox implementations for macOS, Linux, and Windows.
Devin and OpenHands use Docker, which gives strong shell restrictions (everything runs inside the container) but weaker network control and no cross-platform story beyond "install Docker." Aider skips sandboxing entirely, trading security for zero setup friction.
The real gate: in-system permissions for bash
All the sandbox technologies above — Docker, Firecracker, Landlock, Seatbelt — answer the same question: if the agent runs a dangerous command, how do we limit the blast radius? But there's a prior question that matters more in practice: how does the agent get permission to run bash in the first place?
Every agent system is fundamentally a loop that decides whether to grant an LLM access to a shell. The sandbox is the safety net. The permission system is the gate. In practice, the gate does more work than the net.
Permission model comparison
| System | Default bash policy | Approval mechanism | Escape hatch | Granularity |
|---|---|---|---|---|
| Claude Code | Ask before every shell command | User prompt per-command; PreToolUse hooks for automation | --dangerously-skip-permissions | Per-command pattern matching (Bash(git commit:*)) |
| Cursor | Deny outside workspace | Auto-approve safe tools (grep, ls); prompt for state-changing commands | Enterprise admin overrides | Per-tool category (safe / state-changing) |
| Codex CLI | Blocked in sandbox modes | --full-auto mode for trusted environments | --full-auto flag | Per-mode (interactive / sandbox / full-auto) |
| Windsurf | Ask before terminal commands | User approval per-command; .codeiumignore for file exclusions | Security rules with NEVER flags | Per-command with rule overrides |
| Devin | Full access inside container | Container is the boundary — no per-command approval | N/A (container is the sandbox) | All-or-nothing (container-level) |
| OpenHands | Full access inside container | Event stream API gates actions inside container | N/A | Per-action type via event API |
| Aider | Full access, no restriction | None — git is the undo mechanism | N/A | None |
| GitHub Copilot | No local shell access | Runs in Actions; can only push to copilot/ branches | N/A | Branch-scoped |
Two philosophies
The table reveals two fundamentally different approaches:
Gate the command, sandbox the process. Claude Code, Cursor, Codex CLI,
and Windsurf all run on the user's machine and mediate individual bash
commands. The agent proposes a command; the system (or user) approves or
denies it. The OS-level sandbox (Landlock/Seatbelt) is the fallback if
an approved command does something unexpected. This is high-friction but
fine-grained — you can allow git commit while blocking rm -rf /.
Gate the environment, give full access inside. Devin, OpenHands, and GitHub Copilot put the agent in an isolated environment and let it run anything. No per-command approval, no permission prompts. The boundary is the container or VM wall. This is low-friction but coarse — you can't allow some commands while blocking others, because the agent has root inside its sandbox.
The first approach treats the sandbox as defense-in-depth. The second treats the sandbox as the only defense. When the container is the only boundary, a container escape is game over. When per-command approval exists alongside OS-level restrictions, both have to fail simultaneously.
The browser problem
There's a third dimension that neither philosophy handles well: browser access with authenticated sessions.
For autonomous agents doing real-world tasks — filling forms, navigating internal tools, interacting with SaaS dashboards — the browser with the user's logged-in profiles is arguably the most valuable resource on the machine. Cookies, saved passwords, OAuth tokens, browser extensions, session state across dozens of services — all of it lives in the user's browser profile.
Docker and VM-based sandboxes cut the agent off from this entirely. The agent gets a fresh browser with no sessions, no cookies, no saved credentials. Every service requires re-authentication. Extensions don't exist. The agent is starting from zero in the environment where the user has years of accumulated state.
Anthropic's Computer Use runs agents against a visible desktop — with access to whatever the user has open, including browsers. Daytona's Computer Use sandbox takes a similar approach for Linux/macOS/Windows desktop automation. But these operate outside the sandboxing models discussed above. The agent either gets the user's full desktop (powerful but dangerous) or a fresh container desktop (safe but useless for authenticated workflows).
This is the fundamental tension for autonomous agents. The resources that make agents most useful — authenticated browser sessions, logged-in APIs, the user's actual environment — are exactly the resources that sandboxing is designed to restrict. No current system resolves this cleanly. The closest approximation is scoped OAuth tokens and per-service API keys granted to the agent explicitly, but that requires every service to support programmatic access, which most internal tools don't.
What works best
Cursor's data gives the clearest signal: agents with proper environmental sandboxing stop 40% less often than unsandboxed ones. The intuition is counterintuitive — more isolation produces more autonomy, not less.
The reason: without a sandbox, the system must prompt the user for every potentially dangerous action because there's no safety net. With a sandbox, the system can auto-approve most actions because the blast radius is contained. The sandbox replaces per-command friction with environmental safety.
Claude Code's hook system points at a middle path. PreToolUse hooks
let users write shell scripts that approve or deny tool calls based on
pattern matching — Bash(npm install:*) auto-approves, Bash(rm -rf:*)
always denies. This moves permission decisions from interactive prompts
to declarative policy. The agent doesn't stop to ask; the policy
pre-answers.
The pattern that emerges across all systems: the best permission model is the one where the user configures policy once and then forgets about it. Per-command prompts don't scale. Container-level all-or-nothing doesn't give enough control. Declarative per-capability policies — what commands, what directories, what network endpoints — hit the sweet spot. No system has fully nailed this yet.
Pain points
The restrictive/permissive policy trap
ARMO's research documents a pattern seen across organizations: security teams set overly restrictive sandbox policies. The agent breaks within 48 hours. Teams loosen policies incrementally until they become permissive enough to be meaningless. The result is security theater — a sandbox that exists on paper but blocks nothing in practice.
The fix requires progressive enforcement: start permissive with monitoring, tighten based on observed behavior. But most sandbox systems offer all-or-nothing controls, not gradual ones.
Docker cold starts and resource overhead
Docker containers incur 500ms–2s cold starts depending on base image. For coding agents, the situation is worse — cloning a repository and installing packages can add 20–30 seconds before any code executes. At high concurrency (~400 parallel starts), CNI plugin and virtual switch setup inflate boot times by up to 263%.
Teams maintain "warm pools" of pre-booted containers to mitigate this, creating an economic trap: idle containers waste compute, but building a predictive autoscaler is a serious engineering challenge.
Docker socket escape
OpenHands requires /var/run/docker.sock access. This is well-documented
and widely discussed. The socket gives any process with access full control
over the host Docker daemon — start containers, mount host filesystems,
access networks. Container escape equals full host access.
Cross-platform inconsistency
Claude Code, Cursor, and Codex CLI each maintain separate sandbox
implementations for macOS (Seatbelt) and Linux (Landlock/bubblewrap/seccomp).
Windows gets WSL2 at best. Every sandbox has different capabilities,
different edge cases, and different failure modes. Apple marks sandbox-exec
as deprecated — the macOS implementation is load-bearing but architecturally
uncertain.
Symlink escapes and path traversal
CVE-2025-53109 and CVE-2025-53110 in a filesystem MCP server showed that path prefix matching — the most common way sandboxes restrict filesystem access — can be bypassed through symlinks. An agent creates a symlink inside the allowed directory pointing outside it, and the sandbox's path check passes while the actual access goes elsewhere. This is a class of vulnerability, not a single bug.
Constant permission prompts vs autonomy
Agents that sandbox at the process level but need host access for installing packages, running tests, or accessing databases face a dilemma: prompt the user for every action (destroying flow) or run without restriction (destroying safety). Cursor's data — 40% fewer stops with proper sandboxing — shows that environmental sandboxing (the agent runs inside the sandbox, so package installs and test runs don't need approval) is better than per-action approval.
Beyond Docker: the sandbox landscape
Firecracker microVMs
Firecracker — the technology behind AWS Lambda — gives each sandbox its own kernel. Boot time is under 200ms. Memory overhead per microVM is under 5 MiB. This eliminates the shared-kernel escape vectors that plague runc containers.
E2B builds on Firecracker to offer ephemeral sandboxes purpose-built for AI agents. Pre-warmed snapshot pools bring startup under 200ms. Manus uses E2B for its agent execution. Microsandbox is an open-source alternative using libkrun (KVM-based microVMs) with sub-200ms boot, Apache-2.0 license, and MCP integration.
The tradeoff: Firecracker requires KVM (Linux only), and networking setup at scale (TAP interfaces, IP tables, CNI plugins) becomes the primary bottleneck at ~400 parallel starts.
OS-level primitives: what Claude Code and Codex actually use
The most significant finding in this survey: the two most widely-used local coding agents — Claude Code and Codex CLI — don't use Docker or VMs at all. They use kernel-level primitives directly.
Landlock (Linux 5.13+) lets a process restrict its own filesystem and network access at the kernel level without requiring root. Unlike seccomp (which filters syscalls) or namespaces (which need privileges), Landlock lets a process sandbox itself from within. It filters at the point of operation in the kernel, not at the syscall interface.
seccomp-bpf attaches a BPF program to a process that filters every syscall. It's the most surgical tool: you can allow or deny specific syscalls with specific arguments. The limitation: syscall lists are architecture-specific.
bubblewrap combines Linux namespaces (PID, mount, network, user) with an unprivileged setup — the same tool Flatpak uses for desktop app sandboxing.
On macOS, Seatbelt (sandbox-exec) provides kernel-enforced sandboxing
via a Scheme-like profile language. Several community tools have been built
on this pattern for AI agent sandboxing:
agent-seatbelt-sandbox,
scode.
The advantage: near-zero overhead, no daemon process, no root required (on Linux). The disadvantage: not a full machine — the agent shares the host kernel and can't install system packages without host access.
gVisor: user-space kernel
gVisor interposes on syscalls in user space, implementing a subset of the Linux kernel in Go. It sits between containers and the host kernel, providing stronger isolation than runc without requiring a full VM.
Google's Agent Sandbox — announced at KubeCon NA 2025 as a CNCF project — uses gVisor as its foundation, with optional Kata Containers for workloads needing full microVM isolation. Pre-booted warm pools and a declarative CRD API target sub-second cold starts.
The tradeoff: 20–50% overhead on syscall-heavy workloads. For AI agents running compilers and test suites, that's significant.
WebContainers: browser-native
StackBlitz WebContainers run a WASM-based operating system entirely in the browser — no remote server provisioned. The browser's own security sandbox handles isolation. Bolt.new uses this to give AI agents full control over a Node.js environment including filesystem, package manager, and terminal, entirely client-side.
The hard limitation: Node.js and browser-compatible runtimes only. You can't run Python, Rust, Go, or any native binary. For web development agents, it's excellent. For general-purpose coding agents, it's a non-starter.
Emerging platforms
Daytona pivoted from dev environments to AI agent infrastructure in early 2025. Claims sub-27ms sandbox spin-up, raised $24M Series A in early 2026. Supports process execution, filesystem, native Git, and Computer Use sandboxes for desktop automation.
Modal offers a serverless container fabric tested up to 1,000 sandbox creations per second. Used by Lovable and Quora's Poe for AI code execution.
Morph Cloud focuses on instant environment branching — snapshot-and-restore workflows where agents branch from a known state, execute, then discard or persist.
Why WASM doesn't solve it
WASM is the obvious candidate for sandboxing: memory-safe execution with microsecond overhead, capability-based I/O, and strong module isolation. Wasmtime enforces that modules cannot access system resources without explicit capability grants. In theory, perfect.
In practice, the boundary breaks as soon as you need it.
WASI capabilities punch holes in the sandbox. The moment you expose filesystem or network capabilities to a WASM module — which you must, for any agent that needs to read files, install packages, or make API calls — the module has access to those resources. The sandbox is only as strong as the capability grants, and useful agents need broad capabilities.
Resource exhaustion bypasses the capability model entirely. A 2025 security analysis showed that exposed WASI/WASIX interfaces allow malicious modules to starve shared OS resources — CPU cycles, disk I/O, bandwidth, entropy pools, kernel objects. Even with cgroups and quotas, syscall floods can degrade system performance by over 94%. WASM isolates memory. It doesn't isolate compute.
Arbitrary tools can't run in WASM. An agent that runs cargo test,
pytest, gcc, or docker compose needs native binaries. Compiling
every tool and its dependencies to WASM is a massive operational burden —
and many tools (anything using raw syscalls, threads, or mmap) don't
compile at all. Pydantic's Monty is the most credible attempt at a
WASM-based Python sandbox, and it explicitly covers only a subset of
Python.
The I/O complexity problem. Agents need external data — files, databases, APIs — to act. WASM strictly limits I/O. The result is complex JavaScript glue code or custom host functions that effectively bypass the security model you built WASM to enforce.
Microsoft's Wassette (August 2025) is the most principled attempt — a Rust-based runtime executing WASM Components via MCP with deny-by-default capabilities. But it's limited to WASM Components, not arbitrary code execution.
The bottom line: WASM is excellent for plugin sandboxing — running untrusted extensions in a controlled environment with minimal capabilities. It is not a solution for general agent execution where the agent needs to run arbitrary tools, install packages, and interact with the host system.
The isolation hierarchy
Isolation Technologies: Boot Time vs Isolation Strength
| Technology | Boot time | Memory overhead | Isolation | Root needed | Best for |
|---|---|---|---|---|---|
| Landlock/Seatbelt | ~1ms | near-zero | OS-level (shared kernel) | No | Local coding agents |
| WASM (Wasmtime) | microseconds | ~1 MiB | Language VM | No | Plugin sandboxing |
| Docker/runc | 500ms–2s | 10–50 MiB | Shared kernel + namespaces | Yes (daemon) | Legacy, dev environments |
| gVisor (runsc) | ~container | moderate | User-space kernel | Yes | Kubernetes workloads |
| Firecracker | ~125ms | under 5 MiB | Dedicated kernel | Yes (KVM) | Untrusted agent code at scale |
| Full VM (KVM/QEMU) | 1–30s | hundreds of MiB | Strongest | Yes | Maximum isolation |
The industry direction: for executing AI-generated code with untrusted inputs at scale, the minimum viable isolation is a microVM (Firecracker, libkrun, Kata Containers). For local developer tooling, OS-level primitives (Landlock + seccomp + bubblewrap/Seatbelt) are practical and require no daemon.
Docker sits in an awkward middle ground — more overhead than OS-level primitives, less isolation than microVMs. Three runc CVEs in 2025 alone have made it clear that shared-kernel container isolation is not sufficient for untrusted code.
Open questions
This survey raises more questions than it answers. A few that stood out:
Can permission models be progressive rather than binary? The restrictive/permissive policy trap suggests that all-or-nothing controls don't work in practice. A permission system that starts restrictive and widens based on observed behavior — earning trust over a session — doesn't exist yet. What would it take to build one?
Is there a cross-platform sandbox primitive waiting to be built?
Every system maintaining three implementations (macOS/Linux/Windows)
is doing redundant work. Apple marking sandbox-exec as deprecated
adds urgency. A portable capability-based sandbox — something like
pledge/unveil but cross-platform —
would change the economics for every agent project. The Rust ecosystem
has pieces (extrasafe,
seccompiler,
cap-std) but no
unified abstraction.
How do you sandbox an agent that needs your browser? The user's authenticated browser — cookies, extensions, OAuth tokens, session state across dozens of services — is the single most valuable resource for autonomous agents doing real-world tasks. Every sandboxing approach either gives the agent full desktop access (dangerous) or a fresh environment with no sessions (useless). Scoped OAuth tokens are a partial answer, but most internal tools don't support programmatic access. This might be the hardest unsolved problem in agent sandboxing.
What's the right granularity for agent permissions? Per-command approval (Claude Code's default mode) is safe but slow. Per-session blanket access (Aider) is fast but unsafe. Per-capability declarations (the pledge model) sit in between but require the agent to know its needs upfront. The approval gates pattern from planning research suggests permission decisions should happen at plan time, not execution time — but no system implements this yet.
We explore one possible design direction — a workspace-as-sandbox model with opt-in host resource linking — in a follow-up post.
Further reading
- Anthropic Engineering: Claude Code Sandboxing — Anthropic's own writeup of the Seatbelt/bubblewrap approach
- Cursor Agent Sandboxing — Cursor's Landlock/seccomp implementation and 40% fewer stops metric
- Deep Dive on Agent Sandboxes — Pierce Freeman's technical comparison
- Docker Sandboxes Aren't Enough for Agent Safety — why execution sandboxing ≠ agent safety
- OWASP Top 10 for Agentic Applications — the first industry-standard framework for AI agent security
- Porting OpenBSD pledge() to Linux — Justine Tunney's implementation of pledge/unveil using seccomp + Landlock
- Why multi-agent workflows fail in production — environment reliability is a dimension of multi-agent coordination
- Less code, more skills — the case for declarative policy over hardcoded behavior