Libraries

Standalone Rust crates for local inference — MLX (Apple Silicon) and llama.cpp (cross-platform).

CrabLLM includes two standalone inference libraries that can be used independently or as part of the gateway.

crabllm-mlx

Local inference on Apple Silicon via the MLX framework. Multi-model cache with idle eviction, streaming chat completions, and tool calling support. macOS and iOS only.

Source: crates/mlx
Platform: Apple Silicon (M1+)
Model format: HuggingFace MLX weights (.safetensors)
Model aliases: family-param_size-quant (e.g. qwen3.5-2b-4bit)
Capabilities: chat completions, streaming, tool calling

Cross-platform local inference via managed llama.cpp server processes. Auto-downloads the llama-server binary, pulls GGUF models from the Ollama registry, spawns per-model servers on demand, and evicts idle servers.

Source: crates/llamacpp
Platform: macOS, Linux, Windows
GPU: Metal (macOS), CUDA (Linux/Windows), CPU fallback
Model format: GGUF via Ollama registry (name:tag, e.g. llama3.2:3b)
Capabilities: chat completions, streaming, tool calling

Libraries

crabllm-mlx

crabllm-llamacpp

On this page