We're dogfooding OpenWalrus as our own community bot — join us and help build it on Discord →

← Back to blog

Built-in tools: what your agent can reach

We cataloged the built-in tools in ten AI coding products. Some ship 5, some ship 20+. The differences reveal what each product thinks an agent should do.

research·OpenWalrus Team·

Every coding agent ships a toolbox. But what's actually in it?

The tools an agent has access to define its ceiling. An agent without a browser can't test web apps. An agent without code intelligence can't jump to definitions. An agent without a terminal can't run tests. What products choose to include — and what they leave out — reveals their theory of what an agent should do.

We cataloged the built-in tools in ten AI coding products, all from official documentation and public repos as of March 2026. This is a companion to our survey of built-in agents — that post covered the agents, this one covers what those agents can reach.

What we surveyed

Ten products, same list as the agents survey:

  • Claude Code (Anthropic) — CLI agent
  • GitHub Copilot — IDE + CLI agent
  • Cursor — IDE agent
  • Windsurf (Codeium) — IDE agent
  • Devin (Cognition) — autonomous cloud agent
  • OpenHands (All Hands AI) — multi-agent framework
  • Aider — terminal pair programmer
  • Amazon Q Developer — AWS-integrated agent
  • Gemini Code Assist (Google) — IDE agent
  • Augment Code — IDE agent + orchestration

Product by product

Claude Code

The most granular tool separation we found. Eleven named tools, each with distinct parameters and individually permissioned:

ToolCategoryPurpose
ReadFileRead file contents with optional line range
WriteFileCreate or overwrite a file
EditFileExact string replacement in a file
GlobFileFast file pattern matching
BashTerminalExecute shell commands
GrepSearchContent search built on ripgrep
WebSearchWebSearch the web for information
WebFetchWebFetch and process URL content
NotebookEditFileEdit Jupyter notebook cells
TodoWritePlanningStructured task tracking
TaskAgentLaunch subagent for complex work

Each tool can be placed in an allow list (auto-approved), deny list (blocked), or default ask list (requires user approval). Subagents receive restricted tool sets — the Explore agent gets read-only tools (Write, Edit denied), Plan gets read-only tools. This per-tool, per-agent permission model is the most fine-grained we found.

GitHub Copilot

The CLI and IDE agent modes share a common tool set, though the exact tool names aren't published the way Claude Code's are:

CategoryCapabilities
FileRead files, edit files, create files
TerminalRun commands, view output
SearchSemantic code search, file search by name
WebWeb search, browser preview
AgentDelegate to Explore/Task/Plan/Code Review agents

Custom agents (Markdown files with YAML frontmatter) can restrict tool access via a tools list — the same pattern as Claude Code. MCP servers extend the tool set beyond built-in capabilities.

Copilot's agent mode automatically selects which tools to use and can run multiple tool calls in parallel. Tool selection is not user-visible in the same way as Claude Code's individual permission prompts.

Cursor

Ten documented tools available in Agent mode:

ToolCategoryPurpose
Semantic searchSearchSearch by meaning across indexed codebase
File/folder searchSearchFind by name, directory, keywords
Web searchWebSearch the internet
Fetch rulesConfigRetrieve project rules
Read filesFileRead text and image files
Edit filesFileSuggest and auto-apply edits
Run shell commandsTerminalExecute terminal commands
Browser controlWebNavigate, screenshot, interact with pages
Image generationVisionGenerate images
Ask clarifying questionsUserRequest information from user

Browser control is notable — Cursor can navigate to URLs, take screenshots, click elements, and type text. Most products don't ship browser interaction at all. Image generation is also rare; Cursor can generate images as part of its workflow.

Custom Modes restrict which tools are available. Ask Mode removes write capabilities. Manual Mode limits to explicit file editing.

Windsurf

Windsurf's Cascade agent has a smaller, less granular tool set:

CategoryCapabilities
SearchSearch and analyze codebase
WebWeb search
TerminalTerminal command execution
Code qualityLinter integration (auto-fixes lint errors)
Package managementAuto-detects and installs missing packages

A hard limit of 20 tool calls per prompt caps how much work the agent can do in a single turn. This is the only product in our survey with a documented tool-call ceiling.

Extensibility is limited to MCP server configuration. There's no per-tool permission model — tools are either available in a mode or not.

Devin

The broadest tool surface in our survey. Devin runs in a cloud IDE environment with full system access:

CategoryCapabilities
FileFull filesystem access (editor + file explorer)
TerminalMultiple terminal sessions
BrowserFull Chromium browser (real, not headless)
SearchDevin Search with "Deep Mode" for complex queries
KnowledgeDevin Wiki (auto-indexed repo documentation)
ReviewDevin Review (code review with commit application)
TestingDesktop Testing via computer use (Linux)
GitFull git operations

Devin's tools aren't discrete named functions — they're a full operating environment. The agent can open multiple terminals, browse the web, interact with GUIs, and run desktop applications. This is closer to giving the agent a full computer than a set of API tools.

The tradeoff: Devin runs in the cloud, not locally. Everything happens in Cognition's sandboxed VMs.

OpenHands

OpenHands takes the most radical approach to tooling: CodeActAgent unifies all actions into executable code.

CategoryImplementation
File operationsopen(), os.path, shell commands
TerminalBash execution (arbitrary commands)
PythonInteractive Python interpreter
BrowserDelegated to BrowsingAgent
User interactionNatural language conversation

There are no named tools like "Read" or "Edit." The agent writes Python or bash that does what it needs. Want to read a file? cat file.txt. Want to search? grep -r pattern .. Want to install a package? pip install package.

This "code action space" approach means OpenHands has no tool ceiling — anything you can do in a terminal or Python REPL, the agent can do. But it also means there's no tool-level permission control. You can't say "allow file reads but deny file writes" because both happen through the same execution mechanism. We explored the security implications of this in our tool permissions survey.

Aider

Aider doesn't expose tools in the agent framework sense. Instead, capabilities are built into the conversation loop:

CategoryImplementation
File editingBuilt into the LLM response format (diff/whole-file/udiff edit formats)
Code searchRepository map via tree-sitter (symbol-level index of entire repo)
Code qualityAuto-lint after every LLM edit
Testing/test command runs tests and auto-fixes failures
File managementFile watching + auto-add when referenced
GitAuto-commit with descriptive messages after each edit
VoiceVoice coding support (transcription → code changes)
VisionImage input for vision-capable models

The repository map is Aider's standout capability. It uses tree-sitter to build a symbol-level index of the entire codebase — function signatures, class definitions, method names — and sends a compressed map to the LLM as context. This gives the model a structural understanding of the codebase without reading every file. No other product in our survey uses tree-sitter this way.

No terminal access is exposed to the model directly — Aider runs commands (lint, test) on the model's behalf but doesn't give the model a shell.

Amazon Q Developer

Amazon Q's agent capabilities are organized as specialized features rather than named tools:

CategoryCapabilities
Code generationReal-time inline suggestions (25+ languages)
File editingMulti-file implementation with test validation
SecurityVulnerability scanning (exposed credentials, injection, etc.)
TestingIterative unit test generation
DocumentationIn-depth doc generation with data flow diagrams
Code reviewLogical errors, anti-patterns, security issues
Transformation.NET porting (Windows → Linux), Java version upgrades

The software development agent runs build and test scripts to validate generated code before presenting results. The CLI supports MCP for external tool integration.

Unlike Claude Code or Cursor, Amazon Q doesn't publish a list of discrete, named tools. The agent's capabilities are described as features, not as an API surface.

Gemini Code Assist

The most IDE-integrated tool set. Google's agent mode documentation lists ten named tools for IntelliJ:

ToolCategoryPurpose
read_fileFileRetrieve text content
write_fileFileWrite text to files
find_filesFileLocate files by name or path
list_filesFileEnumerate directory contents
grepSearchSearch for text patterns
analyze_current_fileCode IntelCheck for errors and warnings
resolve_symbolCode IntelTrace symbol declarations
find_usagesCode IntelIdentify all references to a symbol
gitGitExecute git CLI commands
list_vcs_rootsGitReturn version control repositories

resolve_symbol and find_usages are the standouts. These are code intelligence operations — go-to-definition and find-all-references — that leverage the IDE's language server. No other product in our survey exposes these as first-class agent tools. When Gemini needs to understand how a function is used across a codebase, it can ask the language server rather than grepping for text patterns.

In VS Code, all Gemini CLI built-in tools are available instead. MCP servers extend the set further.

Augment Code

Augment's IDE agent has the broadest integration surface:

CategoryCapabilities
FileFile operations (read, write, edit)
TerminalTerminal execution
SearchWeb search
VisionImage understanding
Multi-repoCross-repository coordination
Native integrationsGitHub, Linear, Jira, Confluence, Notion, Sentry, Stripe
MCP100+ configurable tools
Multi-modelMultiple AI models (Claude, GPT, etc.)

Two implementation details stand out. Parallel tool execution — Augment runs independent tool calls concurrently, claiming 2x faster turns. Most products execute tools sequentially. Native integrations — instead of generic MCP connections, Augment ships purpose-built integrations with project management (Linear, Jira), documentation (Confluence, Notion), and monitoring (Sentry, Stripe) tools. This means the agent can read Jira tickets and Sentry errors without configuring MCP servers.

The inventory at a glance

ProductFile OpsTerminalSearchWeb/BrowserCode IntelGitVision
Claude CodeRead/Write/Edit/GlobBashGrep + WebSearchWebFetchvia Bash
CopilotRead/EditTerminalSemantic + fileWeb search + previewvia terminal
CursorRead/EditShellSemantic + file + webBrowser controlvia shellImage gen + read
WindsurfSearch/analyzeTerminalWeb searchLintervia terminal
DevinEditor + filesystemTerminalDevin SearchFull browserFull gitDesktop use
OpenHandsvia codeBash + Pythonvia codeBrowsingAgentvia code
AiderBuilt-in editRepo map (tree-sitter)tree-sitterAuto-commitImage input
Amazon QSuggestions + editBuild/testSecurity scan
Gemini Code Assistread/write/find/listgrep + find_filesresolve_symbol, find_usagesgit CLI
AugmentFile opsTerminalWeb searchNative integrationsGitHub nativeImage understanding

Three design philosophies

The ten products fall into three approaches to tool design:

Granular named tools. Claude Code and Gemini Code Assist give each operation a distinct name, specific parameters, and independent permissions. Read is not Grep is not Glob. The LLM sees a menu of specific operations and picks the right one. This enables fine-grained permission control — you can allow Read but deny Write, or allow Grep but deny Bash. The cost is more tool definitions consuming context window space, and more decision points where the model can pick the wrong tool.

Code-as-tools. OpenHands and (to a lesser degree) Aider skip the named-tool abstraction. The agent writes executable code — bash or Python — that performs whatever operation it needs. The "tool set" is infinite: anything you can do in a REPL is available. This is maximally expressive but minimally controllable. As we explored in our sandbox permissions survey, the security boundary shifts from "which tools are allowed" to "what can the sandbox environment access."

IDE-integrated tools. Cursor, Gemini Code Assist, and Augment map tools to IDE capabilities. Semantic search uses the IDE's index. resolve_symbol uses the language server. Browser control uses an embedded browser. The agent inherits whatever the IDE can do. This is powerful — code intelligence operations like find-all-references are genuinely useful for refactoring — but ties the agent to a specific IDE runtime.

Built-in Tool Coverage by Product (0–10 scale)

What stands out

Code intelligence is the biggest gap. Only Gemini Code Assist ships resolve_symbol and find_usages as named tools. Every other product relies on text search (grep, ripgrep, semantic search) to understand code structure. Text search can find where a function name appears, but it can't distinguish a definition from a call from a string literal. For large-scale refactoring, this difference matters — and it's the clearest area where IDE-integrated agents have an advantage.

Browser interaction is rare. Only Cursor (browser control: navigate, screenshot, click, type) and Devin (full Chromium in cloud VM) ship browser interaction. The other eight products can't test web UIs, can't follow links in documentation, and can't verify rendered output. As agent tasks get more complex, this gap will grow.

The granularity spectrum is wide. Claude Code has 11+ named tools. OpenHands has effectively 2 (bash + Python interpreter). Both ship, both work, and both have users. The tradeoff is control vs. expressiveness — and the bash bypass problem shows that granular tools don't provide real security if the agent also has a shell.

Vision is emerging but uneven. Cursor generates and reads images. Devin has full desktop computer use. Augment understands images. Aider accepts image input. But Claude Code, Copilot, Windsurf, OpenHands, Amazon Q, and Gemini Code Assist are primarily text-only in their tool interactions.

MCP is the escape hatch. Eight of ten products support MCP for adding tools beyond the built-in set. The built-in tools define the floor — the minimum capability surface. MCP raises the ceiling. But no two products ship the same MCP servers by default, so the "extended" tool set varies widely. We discussed MCP's role as a universal extensibility layer in our skills design post.

Tool Design Philosophy Tradeoffs (0–10 scale)

What the research says

Tool selection accuracy remains an active research area. The ToolBench benchmark (May 2023) showed that GPT-4 achieved 56.6% pass rate on complex tool-use tasks involving 16,000+ real-world APIs — demonstrating that more tools doesn't automatically mean better performance. Models make selection errors when the tool set is large and tools have overlapping functionality.

The CodeAct paper (February 2024) that inspired OpenHands' approach found that code actions outperformed JSON-based tool calls on 6 of 7 benchmarks, with an average 20% improvement. The argument: LLMs are better at writing code than selecting from a tool menu, so "code is the tool" produces better results.

However, Gorilla (May 2023) showed that fine-tuning on API documentation significantly improves tool-use accuracy, and that constrained API calls (named tools with typed parameters) reduce hallucinated function calls compared to free-form code generation. The granular-tools camp has evidence too.

The tradeoff may not be universal. For coding tasks with well-known operations (read, write, search, run), named tools reduce errors. For novel tasks requiring creative tool composition, code-as-tools offers more flexibility.

Open questions

Will code intelligence tools become standard? Gemini Code Assist is alone in shipping resolve_symbol and find_usages. If agents become primary refactoring tools, every product will need symbol-level operations — not just text search. Will they build it, or will MCP language server integrations fill the gap?

Does tool granularity help or hurt LLM performance? Claude Code has 11+ tools; OpenHands has 2. ToolBench suggests more tools can reduce accuracy, but CodeAct suggests code beats API calls. The answer may depend on the model — larger models handle more tools better, but tool-call overhead costs tokens regardless of model size.

Will browser interaction become baseline? Cursor and Devin have it. Eight products don't. As agents take on full-stack tasks (frontend + backend + testing), can they remain effective without seeing the rendered page?

Does "code-as-tools" scale? OpenHands' approach is elegant — infinite expressiveness, zero tool ceiling. But it means every operation goes through bash or Python, making audit trails harder to parse and permissions harder to enforce. Does this matter at scale, or is it a theoretical concern?

Should the built-in tool set be standardized? MCP standardizes the protocol for adding tools. But there's no standard for what tools should ship built-in. If you write an MCP server that provides file operations, does it need to match Claude Code's Read/Write/Edit/Glob interface, or can it define its own? Tool portability across products doesn't exist yet.

What's the right tool-call limit? Windsurf caps at 20 tool calls per prompt. Most products have no documented limit. Is a limit a safety feature (prevents runaway agents) or a capability ceiling (prevents complex multi-step work)?

What this means for walrus

Walrus exposes capabilities to agents through WHS hooks — and the design questions in this survey map directly to WHS architecture.

The granularity question applies to hooks. Should a WHS memory hook expose fine-grained operations (store, query, delete, list) or a single broad operation (execute_memory_operation)? The Claude Code/Gemini approach (granular named tools) enables per-operation permissions. The OpenHands approach (code-as-tools) maximizes expressiveness. WHS currently leans toward granularity — each hook has a typed protobuf interface — and this survey suggests that's the right call for permission control.

Code intelligence is a differentiation opportunity. Nine of ten products can't do resolve_symbol or find_usages. Only Gemini Code Assist ships it, and only because it integrates with the IDE's language server. A WHS hook that provides language-server-style code intelligence (backed by tree-sitter, LSP, or a custom index) would give walrus-powered agents a capability most competitors lack.

Tool-call limits are worth considering. Windsurf's 20-call cap prevents runaway tool use. WHS hooks could implement per-hook rate limits — a memory hook might allow 50 operations per turn, while an inference hook might allow 1. This is more granular than a global tool-call cap and maps naturally to the hook lifecycle.

Further reading