We're dogfooding OpenWalrus as our own community bot — join us and help build it on Discord →

← Back to blog

Built-in web search: no API keys, no setup

How we gave every OpenWalrus agent web search and page fetching — with multi-engine consensus ranking, zero API keys, and zero configuration.

release·OpenWalrus Team·

Agents that can't search the web are half-blind. Most frameworks solve this with API keys — SerpAPI, Tavily, Google Custom Search. You sign up, paste a key, configure a tool, hope the rate limits hold. We built it into the binary instead.

Starting today, every OpenWalrus agent has two new built-in tools: web_search and web_fetch. No API keys. No configuration. No third-party accounts.

Why not just use an API?

Search APIs work, but they come with baggage:

  • Credentials to manage. One more API key per deployment, one more secret to rotate, one more service to monitor for billing surprises.
  • Rate limits. Free tiers cap at 100–1,000 queries/day. Autonomous agents burn through that fast.
  • Cost. SerpAPI runs $50–250/mo for serious usage. Tavily charges per query. These add up alongside LLM inference costs.
  • Privacy. Every query goes to a third-party logging service. For a local-first runtime, routing agent searches through a cloud proxy defeats the point.

We wanted something simpler: search that works the moment you install walrus, with zero setup and zero ongoing cost.

The meta search approach

Instead of depending on a single search provider, we built a meta search engine — walrus-search — that queries multiple free backends in parallel, merges the results, and ranks by consensus.

The current backends are DuckDuckGo (via the Lite HTML endpoint) and Wikipedia (via the OpenSearch API). Neither requires authentication. Both are queried in parallel using tokio::task::JoinSet, so latency is bounded by the slowest engine, not the sum.

Results are deduplicated by normalized URL — stripping trailing slashes, www. prefixes, and tracking parameters (utm_*, fbclid, gclid). When the same URL appears from multiple engines, the descriptions are merged (longer one wins) and the result gets a consensus score boost.

Consensus ranking

Ranking is simple and deterministic:

  • Position score: 1.0 / (position + 1) — earlier results from each engine score higher.
  • Consensus bonus: 0.5 * (engine_count - 1) — each additional engine that returns the same URL adds 0.5 to the score.

A result that DuckDuckGo ranks #1 and Wikipedia also returns will outscore a result that only one engine knows about. No ML model, no relevance tuning, no training data. Just arithmetic that rewards agreement across independent sources.

Page fetching

web_fetch downloads a URL and extracts clean text content. It strips <script>, <style>, <nav>, <footer>, <header>, <aside>, <noscript>, <svg>, <iframe>, and <form> subtrees entirely, then walks the remaining DOM and collects text nodes with proper spacing.

The fetcher rotates through 8 realistic browser user-agent strings (Chrome, Firefox, Safari, Edge across Windows/Mac/Linux) to avoid bot detection. No headless browser, no Playwright dependency — just HTTP requests and HTML parsing.

Zero configuration for agents

Both tools are registered in BASE_TOOLS, which means every agent gets them automatically — even agents with scoped tool whitelists. The aggregator and fetch client initialize at daemon startup with sensible defaults:

SettingDefault
EnginesDuckDuckGo, Wikipedia
Timeout10 seconds per engine
Max results20
Cache TTL5 minutes

An in-memory cache prevents redundant queries. Same search within the TTL window returns instantly.

This follows the same design principle behind the rest of OpenWalrus: batteries included, nothing to configure for the common case.

The standalone CLI

The search engine also ships as a standalone binary — wsearch — for human use and debugging:

# Search
wsearch search "rust async runtime"
wsearch search "openwalrus" --engines wikipedia
wsearch search "hello world" -n 5 --format text

# Fetch a page
wsearch fetch "https://example.com"

# List available engines
wsearch engines

Configuration lives in ~/.config/wsearch/config.toml:

engines = ["duckduckgo", "wikipedia"]
timeout_secs = 10
max_results = 20
cache_ttl_secs = 300
output_format = "json"

How agents use it

The tools work like any other built-in. An agent searching for information simply calls web_search, reviews the results, then optionally calls web_fetch on the most relevant URLs to read full page content:

{
  "name": "web_search",
  "parameters": {
    "query": "rust error handling best practices",
    "max_results": 5
  }
}
{
  "name": "web_fetch",
  "parameters": {
    "url": "https://doc.rust-lang.org/book/ch09-00-error-handling.html"
  }
}

No tool registration, no MCP server, no plugin. It just works.

In sandbox mode, agents still have network access for search — the OS-level isolation restricts filesystem access, not outbound HTTP.

What's next

The meta search architecture is designed to grow. We're looking at:

  • More engines — Brave Search, SearXNG instances, and domain-specific backends for code (GitHub, docs sites).
  • Per-engine weighting — let users boost or suppress specific engines based on query type.
  • Search-to-memory — automatically caching search results in the agent's graph memory so repeated research doesn't re-fetch the same pages.

The full tool reference is in the built-in tools docs.