Built-in tools: what your agent can reach
We cataloged the built-in tools in ten AI coding products. Some ship 5, some ship 20+. The differences reveal what each product thinks an agent should do.
Every coding agent ships a toolbox. But what's actually in it?
The tools an agent has access to define its ceiling. An agent without a browser can't test web apps. An agent without code intelligence can't jump to definitions. An agent without a terminal can't run tests. What products choose to include — and what they leave out — reveals their theory of what an agent should do.
We cataloged the built-in tools in ten AI coding products, all from official documentation and public repos as of March 2026. This is a companion to our survey of built-in agents — that post covered the agents, this one covers what those agents can reach.
What we surveyed
Ten products, same list as the agents survey:
- Claude Code (Anthropic) — CLI agent
- GitHub Copilot — IDE + CLI agent
- Cursor — IDE agent
- Windsurf (Codeium) — IDE agent
- Devin (Cognition) — autonomous cloud agent
- OpenHands (All Hands AI) — multi-agent framework
- Aider — terminal pair programmer
- Amazon Q Developer — AWS-integrated agent
- Gemini Code Assist (Google) — IDE agent
- Augment Code — IDE agent + orchestration
Product by product
Claude Code
The most granular tool separation we found. Eleven named tools, each with distinct parameters and individually permissioned:
| Tool | Category | Purpose |
|---|---|---|
| Read | File | Read file contents with optional line range |
| Write | File | Create or overwrite a file |
| Edit | File | Exact string replacement in a file |
| Glob | File | Fast file pattern matching |
| Bash | Terminal | Execute shell commands |
| Grep | Search | Content search built on ripgrep |
| WebSearch | Web | Search the web for information |
| WebFetch | Web | Fetch and process URL content |
| NotebookEdit | File | Edit Jupyter notebook cells |
| TodoWrite | Planning | Structured task tracking |
| Task | Agent | Launch subagent for complex work |
Each tool can be placed in an allow list (auto-approved), deny list (blocked), or default ask list (requires user approval). Subagents receive restricted tool sets — the Explore agent gets read-only tools (Write, Edit denied), Plan gets read-only tools. This per-tool, per-agent permission model is the most fine-grained we found.
GitHub Copilot
The CLI and IDE agent modes share a common tool set, though the exact tool names aren't published the way Claude Code's are:
| Category | Capabilities |
|---|---|
| File | Read files, edit files, create files |
| Terminal | Run commands, view output |
| Search | Semantic code search, file search by name |
| Web | Web search, browser preview |
| Agent | Delegate to Explore/Task/Plan/Code Review agents |
Custom agents (Markdown files with YAML frontmatter) can restrict tool access via a tools list — the same pattern as Claude Code. MCP servers extend the tool set beyond built-in capabilities.
Copilot's agent mode automatically selects which tools to use and can run multiple tool calls in parallel. Tool selection is not user-visible in the same way as Claude Code's individual permission prompts.
Cursor
Ten documented tools available in Agent mode:
| Tool | Category | Purpose |
|---|---|---|
| Semantic search | Search | Search by meaning across indexed codebase |
| File/folder search | Search | Find by name, directory, keywords |
| Web search | Web | Search the internet |
| Fetch rules | Config | Retrieve project rules |
| Read files | File | Read text and image files |
| Edit files | File | Suggest and auto-apply edits |
| Run shell commands | Terminal | Execute terminal commands |
| Browser control | Web | Navigate, screenshot, interact with pages |
| Image generation | Vision | Generate images |
| Ask clarifying questions | User | Request information from user |
Browser control is notable — Cursor can navigate to URLs, take screenshots, click elements, and type text. Most products don't ship browser interaction at all. Image generation is also rare; Cursor can generate images as part of its workflow.
Custom Modes restrict which tools are available. Ask Mode removes write capabilities. Manual Mode limits to explicit file editing.
Windsurf
Windsurf's Cascade agent has a smaller, less granular tool set:
| Category | Capabilities |
|---|---|
| Search | Search and analyze codebase |
| Web | Web search |
| Terminal | Terminal command execution |
| Code quality | Linter integration (auto-fixes lint errors) |
| Package management | Auto-detects and installs missing packages |
A hard limit of 20 tool calls per prompt caps how much work the agent can do in a single turn. This is the only product in our survey with a documented tool-call ceiling.
Extensibility is limited to MCP server configuration. There's no per-tool permission model — tools are either available in a mode or not.
Devin
The broadest tool surface in our survey. Devin runs in a cloud IDE environment with full system access:
| Category | Capabilities |
|---|---|
| File | Full filesystem access (editor + file explorer) |
| Terminal | Multiple terminal sessions |
| Browser | Full Chromium browser (real, not headless) |
| Search | Devin Search with "Deep Mode" for complex queries |
| Knowledge | Devin Wiki (auto-indexed repo documentation) |
| Review | Devin Review (code review with commit application) |
| Testing | Desktop Testing via computer use (Linux) |
| Git | Full git operations |
Devin's tools aren't discrete named functions — they're a full operating environment. The agent can open multiple terminals, browse the web, interact with GUIs, and run desktop applications. This is closer to giving the agent a full computer than a set of API tools.
The tradeoff: Devin runs in the cloud, not locally. Everything happens in Cognition's sandboxed VMs.
OpenHands
OpenHands takes the most radical approach to tooling: CodeActAgent unifies all actions into executable code.
| Category | Implementation |
|---|---|
| File operations | open(), os.path, shell commands |
| Terminal | Bash execution (arbitrary commands) |
| Python | Interactive Python interpreter |
| Browser | Delegated to BrowsingAgent |
| User interaction | Natural language conversation |
There are no named tools like "Read" or "Edit." The agent writes Python or bash that does what it needs. Want to read a file? cat file.txt. Want to search? grep -r pattern .. Want to install a package? pip install package.
This "code action space" approach means OpenHands has no tool ceiling — anything you can do in a terminal or Python REPL, the agent can do. But it also means there's no tool-level permission control. You can't say "allow file reads but deny file writes" because both happen through the same execution mechanism. We explored the security implications of this in our tool permissions survey.
Aider
Aider doesn't expose tools in the agent framework sense. Instead, capabilities are built into the conversation loop:
| Category | Implementation |
|---|---|
| File editing | Built into the LLM response format (diff/whole-file/udiff edit formats) |
| Code search | Repository map via tree-sitter (symbol-level index of entire repo) |
| Code quality | Auto-lint after every LLM edit |
| Testing | /test command runs tests and auto-fixes failures |
| File management | File watching + auto-add when referenced |
| Git | Auto-commit with descriptive messages after each edit |
| Voice | Voice coding support (transcription → code changes) |
| Vision | Image input for vision-capable models |
The repository map is Aider's standout capability. It uses tree-sitter to build a symbol-level index of the entire codebase — function signatures, class definitions, method names — and sends a compressed map to the LLM as context. This gives the model a structural understanding of the codebase without reading every file. No other product in our survey uses tree-sitter this way.
No terminal access is exposed to the model directly — Aider runs commands (lint, test) on the model's behalf but doesn't give the model a shell.
Amazon Q Developer
Amazon Q's agent capabilities are organized as specialized features rather than named tools:
| Category | Capabilities |
|---|---|
| Code generation | Real-time inline suggestions (25+ languages) |
| File editing | Multi-file implementation with test validation |
| Security | Vulnerability scanning (exposed credentials, injection, etc.) |
| Testing | Iterative unit test generation |
| Documentation | In-depth doc generation with data flow diagrams |
| Code review | Logical errors, anti-patterns, security issues |
| Transformation | .NET porting (Windows → Linux), Java version upgrades |
The software development agent runs build and test scripts to validate generated code before presenting results. The CLI supports MCP for external tool integration.
Unlike Claude Code or Cursor, Amazon Q doesn't publish a list of discrete, named tools. The agent's capabilities are described as features, not as an API surface.
Gemini Code Assist
The most IDE-integrated tool set. Google's agent mode documentation lists ten named tools for IntelliJ:
| Tool | Category | Purpose |
|---|---|---|
read_file | File | Retrieve text content |
write_file | File | Write text to files |
find_files | File | Locate files by name or path |
list_files | File | Enumerate directory contents |
grep | Search | Search for text patterns |
analyze_current_file | Code Intel | Check for errors and warnings |
resolve_symbol | Code Intel | Trace symbol declarations |
find_usages | Code Intel | Identify all references to a symbol |
git | Git | Execute git CLI commands |
list_vcs_roots | Git | Return version control repositories |
resolve_symbol and find_usages are the standouts. These are code intelligence operations — go-to-definition and find-all-references — that leverage the IDE's language server. No other product in our survey exposes these as first-class agent tools. When Gemini needs to understand how a function is used across a codebase, it can ask the language server rather than grepping for text patterns.
In VS Code, all Gemini CLI built-in tools are available instead. MCP servers extend the set further.
Augment Code
Augment's IDE agent has the broadest integration surface:
| Category | Capabilities |
|---|---|
| File | File operations (read, write, edit) |
| Terminal | Terminal execution |
| Search | Web search |
| Vision | Image understanding |
| Multi-repo | Cross-repository coordination |
| Native integrations | GitHub, Linear, Jira, Confluence, Notion, Sentry, Stripe |
| MCP | 100+ configurable tools |
| Multi-model | Multiple AI models (Claude, GPT, etc.) |
Two implementation details stand out. Parallel tool execution — Augment runs independent tool calls concurrently, claiming 2x faster turns. Most products execute tools sequentially. Native integrations — instead of generic MCP connections, Augment ships purpose-built integrations with project management (Linear, Jira), documentation (Confluence, Notion), and monitoring (Sentry, Stripe) tools. This means the agent can read Jira tickets and Sentry errors without configuring MCP servers.
The inventory at a glance
| Product | File Ops | Terminal | Search | Web/Browser | Code Intel | Git | Vision |
|---|---|---|---|---|---|---|---|
| Claude Code | Read/Write/Edit/Glob | Bash | Grep + WebSearch | WebFetch | — | via Bash | — |
| Copilot | Read/Edit | Terminal | Semantic + file | Web search + preview | — | via terminal | — |
| Cursor | Read/Edit | Shell | Semantic + file + web | Browser control | — | via shell | Image gen + read |
| Windsurf | Search/analyze | Terminal | Web search | — | Linter | via terminal | — |
| Devin | Editor + filesystem | Terminal | Devin Search | Full browser | — | Full git | Desktop use |
| OpenHands | via code | Bash + Python | via code | BrowsingAgent | — | via code | — |
| Aider | Built-in edit | — | Repo map (tree-sitter) | — | tree-sitter | Auto-commit | Image input |
| Amazon Q | Suggestions + edit | Build/test | — | — | Security scan | — | — |
| Gemini Code Assist | read/write/find/list | — | grep + find_files | — | resolve_symbol, find_usages | git CLI | — |
| Augment | File ops | Terminal | Web search | — | Native integrations | GitHub native | Image understanding |
Three design philosophies
The ten products fall into three approaches to tool design:
Granular named tools. Claude Code and Gemini Code Assist give each operation a distinct name, specific parameters, and independent permissions. Read is not Grep is not Glob. The LLM sees a menu of specific operations and picks the right one. This enables fine-grained permission control — you can allow Read but deny Write, or allow Grep but deny Bash. The cost is more tool definitions consuming context window space, and more decision points where the model can pick the wrong tool.
Code-as-tools. OpenHands and (to a lesser degree) Aider skip the named-tool abstraction. The agent writes executable code — bash or Python — that performs whatever operation it needs. The "tool set" is infinite: anything you can do in a REPL is available. This is maximally expressive but minimally controllable. As we explored in our sandbox permissions survey, the security boundary shifts from "which tools are allowed" to "what can the sandbox environment access."
IDE-integrated tools. Cursor, Gemini Code Assist, and Augment map tools to IDE capabilities. Semantic search uses the IDE's index. resolve_symbol uses the language server. Browser control uses an embedded browser. The agent inherits whatever the IDE can do. This is powerful — code intelligence operations like find-all-references are genuinely useful for refactoring — but ties the agent to a specific IDE runtime.
Built-in Tool Coverage by Product (0–10 scale)
What stands out
Code intelligence is the biggest gap. Only Gemini Code Assist ships resolve_symbol and find_usages as named tools. Every other product relies on text search (grep, ripgrep, semantic search) to understand code structure. Text search can find where a function name appears, but it can't distinguish a definition from a call from a string literal. For large-scale refactoring, this difference matters — and it's the clearest area where IDE-integrated agents have an advantage.
Browser interaction is rare. Only Cursor (browser control: navigate, screenshot, click, type) and Devin (full Chromium in cloud VM) ship browser interaction. The other eight products can't test web UIs, can't follow links in documentation, and can't verify rendered output. As agent tasks get more complex, this gap will grow.
The granularity spectrum is wide. Claude Code has 11+ named tools. OpenHands has effectively 2 (bash + Python interpreter). Both ship, both work, and both have users. The tradeoff is control vs. expressiveness — and the bash bypass problem shows that granular tools don't provide real security if the agent also has a shell.
Vision is emerging but uneven. Cursor generates and reads images. Devin has full desktop computer use. Augment understands images. Aider accepts image input. But Claude Code, Copilot, Windsurf, OpenHands, Amazon Q, and Gemini Code Assist are primarily text-only in their tool interactions.
MCP is the escape hatch. Eight of ten products support MCP for adding tools beyond the built-in set. The built-in tools define the floor — the minimum capability surface. MCP raises the ceiling. But no two products ship the same MCP servers by default, so the "extended" tool set varies widely. We discussed MCP's role as a universal extensibility layer in our skills design post.
Tool Design Philosophy Tradeoffs (0–10 scale)
What the research says
Tool selection accuracy remains an active research area. The ToolBench benchmark (May 2023) showed that GPT-4 achieved 56.6% pass rate on complex tool-use tasks involving 16,000+ real-world APIs — demonstrating that more tools doesn't automatically mean better performance. Models make selection errors when the tool set is large and tools have overlapping functionality.
The CodeAct paper (February 2024) that inspired OpenHands' approach found that code actions outperformed JSON-based tool calls on 6 of 7 benchmarks, with an average 20% improvement. The argument: LLMs are better at writing code than selecting from a tool menu, so "code is the tool" produces better results.
However, Gorilla (May 2023) showed that fine-tuning on API documentation significantly improves tool-use accuracy, and that constrained API calls (named tools with typed parameters) reduce hallucinated function calls compared to free-form code generation. The granular-tools camp has evidence too.
The tradeoff may not be universal. For coding tasks with well-known operations (read, write, search, run), named tools reduce errors. For novel tasks requiring creative tool composition, code-as-tools offers more flexibility.
Open questions
Will code intelligence tools become standard? Gemini Code Assist is alone in shipping resolve_symbol and find_usages. If agents become primary refactoring tools, every product will need symbol-level operations — not just text search. Will they build it, or will MCP language server integrations fill the gap?
Does tool granularity help or hurt LLM performance? Claude Code has 11+ tools; OpenHands has 2. ToolBench suggests more tools can reduce accuracy, but CodeAct suggests code beats API calls. The answer may depend on the model — larger models handle more tools better, but tool-call overhead costs tokens regardless of model size.
Will browser interaction become baseline? Cursor and Devin have it. Eight products don't. As agents take on full-stack tasks (frontend + backend + testing), can they remain effective without seeing the rendered page?
Does "code-as-tools" scale? OpenHands' approach is elegant — infinite expressiveness, zero tool ceiling. But it means every operation goes through bash or Python, making audit trails harder to parse and permissions harder to enforce. Does this matter at scale, or is it a theoretical concern?
Should the built-in tool set be standardized? MCP standardizes the protocol for adding tools. But there's no standard for what tools should ship built-in. If you write an MCP server that provides file operations, does it need to match Claude Code's Read/Write/Edit/Glob interface, or can it define its own? Tool portability across products doesn't exist yet.
What's the right tool-call limit? Windsurf caps at 20 tool calls per prompt. Most products have no documented limit. Is a limit a safety feature (prevents runaway agents) or a capability ceiling (prevents complex multi-step work)?
What this means for walrus
Walrus exposes capabilities to agents through WHS hooks — and the design questions in this survey map directly to WHS architecture.
The granularity question applies to hooks. Should a WHS memory hook expose fine-grained operations (store, query, delete, list) or a single broad operation (execute_memory_operation)? The Claude Code/Gemini approach (granular named tools) enables per-operation permissions. The OpenHands approach (code-as-tools) maximizes expressiveness. WHS currently leans toward granularity — each hook has a typed protobuf interface — and this survey suggests that's the right call for permission control.
Code intelligence is a differentiation opportunity. Nine of ten products can't do resolve_symbol or find_usages. Only Gemini Code Assist ships it, and only because it integrates with the IDE's language server. A WHS hook that provides language-server-style code intelligence (backed by tree-sitter, LSP, or a custom index) would give walrus-powered agents a capability most competitors lack.
Tool-call limits are worth considering. Windsurf's 20-call cap prevents runaway tool use. WHS hooks could implement per-hook rate limits — a memory hook might allow 50 operations per turn, while an inference hook might allow 1. This is more granular than a global tool-call cap and maps naturally to the hook lifecycle.
Further reading
- Claude Code tools — Anthropic
- GitHub Copilot CLI — GitHub
- Cursor agent tools — Cursor
- Windsurf Cascade — Codeium
- Devin 2.0 — Cognition
- OpenHands agents — All Hands AI
- Aider repository map — Aider
- Amazon Q Developer features — AWS
- Gemini Code Assist agent mode — Google
- Augment IDE agents — Augment Code
- CodeAct: executable code actions — arxiv (February 2024)
- ToolBench: tool-use benchmarking — arxiv (May 2023)
- Gorilla: LLM tool-use fine-tuning — arxiv (May 2023)
- Model Context Protocol — MCP