Chapter 00 · What it is

One API.
Every model.

Route requests to OpenAI, Anthropic, Gemini, Azure, Bedrock, or Ollama. Sub-millisecond overhead. Single binary. No runtime.

Install · one line
cargo install crabllm crabctl
00 · Install
Fig. 00
Fig. 00 · The Router. Requests enter as one format, pass through the gateway membrane, and fan out to provider zones. Hover a zone to simulate failover. Click for a burst of traffic.
“Same OpenAI/Anthropic format, any provider.”
Chapter 01

What you get

Six capabilities the gateway gives you out of the box. Each one a plate; each plate a sentence; the sheet reads in any order you like.

Fig. 01  ·  Feature

Provider translation

Send OpenAI format. CrabLLM translates to Anthropic, Gemini, Bedrock, and Azure automatically.

Fig. 02  ·  Feature

Routing & fallback

Weighted random selection across providers. Exponential backoff retry. Automatic failover.

Fig. 04  ·  Feature

Virtual keys & auth

Per-key model access control. Rate limiting, usage tracking, and budget enforcement.

Chapter 02

Performance

Gateway overhead at 5,000 concurrent requests per second. All gateways run with identical resource limits (2 CPUs, 512 MB) against a mock backend with instant responses. Full results →

GatewayP50P99
CrabLLM0.26ms0.54ms
Bifrost0.61ms1.26ms
LiteLLM159ms227ms
02-A · Latency
GatewayPeak RSS
CrabLLM34.9 MB
Bifrost171.7 MB
LiteLLM541.8 MB
02-B · Memory
Chapter 03

How it works

1Configure
listen = "0.0.0.0:8080"

[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]

[providers.anthropic]
kind = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
models = ["claude-sonnet-4-20250514"]
03-A · Config
2Run
crabllm --config crabllm.toml
03-B · Launch
3Send requests
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4-20250514",
       "messages": [{"role": "user", "content": "Hello!"}]}'
03-C · Request

Same OpenAI format, any provider. CrabLLM translates automatically.

Frequently asked questions
0.26ms P50 at 5,000 RPS. Rust with Tokio — no GC pauses, no interpreter. The gateway is not the bottleneck.

One API. Every model.

0.26ms · single binary
© 2026 · CrabTalk