CrabTalkCrabTalk

Architecture

Design principles, workspace layout, and request flow through the CrabLLM gateway.

Principles

  • Simplicity over abstraction. No trait where a function suffices.
  • Single responsibility. Each crate has one focused job.
  • OpenAI as canonical format. Providers translate to/from it.
  • Streaming first-class. Never buffer a full response when streaming.
  • Configuration-driven. Provider setup and routing from config, not code.
  • Minimal gateway latency. Avoid hot-path allocations.

Workspace layout

crabllm/
  crates/
    crabllm/    — binary (serve, init, openapi subcommands)
    crabctl/    — admin CLI for managing a running gateway
    core/       — shared types, config, errors
    provider/   — provider enum + translation modules
    proxy/      — HTTP server, routing, extensions, admin API
    mlx/        — Apple Silicon local inference via MLX
    llamacpp/   — cross-platform local inference via llama.cpp
    bench/      — benchmark mock backend

Crates

crabllm

Binary entry point. Three subcommands:

  • serve (default) — loads TOML config, builds the provider registry, initializes storage + extensions, starts the Axum HTTP server. Flags: --config, --bind, -v/-vv/-vvv for verbosity.
  • init — generates a starter crabllm.toml in the current directory.
  • openapi — dumps the OpenAPI spec as JSON or a self-contained Scalar HTML page.

crabctl

Admin CLI for managing a running gateway over HTTP. Supports key management (keys list|create|get|update|delete), provider management (providers list|create|get|update|delete), usage/budget/logs queries, and cache clearing. See Management.

core

Shared types with no business logic. Contains:

  • ConfigGatewayConfig with env var interpolation.
  • Types — OpenAI-compatible wire format structs (request, response, chunk).
  • Provider trait — async trait with methods for chat, streaming, embeddings, images, audio. Uses RPITIT for zero-cost dispatch.
  • Error — error enum with transient detection for retry logic.
  • Storage — async KV trait with memory, SQLite, and Redis backends.
  • Extension — hook trait for the request pipeline.

provider

Provider dispatch. ProviderRegistry maps model names to weighted deployment lists. Supports alias resolution, weighted random selection, and per-model provider lookup. Generic over P: Provider so it unifies remote APIs, MLX, and llama.cpp.

proxy

Axum HTTP server. Route handlers implement retry + fallback across deployments. Auth middleware validates virtual keys. Five built-in extensions run as in-handler hooks. Admin API routes at /v1/admin/* for dynamic key and provider management. OpenAPI/Scalar docs at /docs and /openapi.json when enabled.

mlx

Local inference on Apple Silicon. Thin Rust wrapper around a Swift static library using the MLX framework. Multi-model cache with idle eviction. Supports chat completions (streaming + non-streaming) with tool calling. macOS and iOS only — stubs out on other platforms.

llamacpp

Cross-platform local inference. Manages the lifecycle of spawned llama-server processes — auto-downloads the binary, pulls models from the Ollama registry, spawns per-model servers on demand, and evicts idle servers.

Request flow

On this page