CrabTalkCrabTalk

Libraries

Standalone Rust crates for local inference — MLX (Apple Silicon) and llama.cpp (cross-platform).

CrabLLM includes two standalone inference libraries that can be used independently or as part of the gateway.

crabllm-mlx

Local inference on Apple Silicon via the MLX framework. Multi-model cache with idle eviction, streaming chat completions, and tool calling support. macOS and iOS only.

  • Source: crates/mlx
  • Platform: Apple Silicon (M1+)
  • Model format: HuggingFace MLX weights (.safetensors)
  • Model aliases: family-param_size-quant (e.g. qwen3.5-2b-4bit)
  • Capabilities: chat completions, streaming, tool calling

crabllm-llamacpp

Cross-platform local inference via managed llama.cpp server processes. Auto-downloads the llama-server binary, pulls GGUF models from the Ollama registry, spawns per-model servers on demand, and evicts idle servers.

  • Source: crates/llamacpp
  • Platform: macOS, Linux, Windows
  • GPU: Metal (macOS), CUDA (Linux/Windows), CPU fallback
  • Model format: GGUF via Ollama registry (name:tag, e.g. llama3.2:3b)
  • Capabilities: chat completions, streaming, tool calling

On this page