Clank Labs Model
Wrench
Purpose-built agentic AI models. Fine-tuned for tool calling, error recovery, and system prompt following. The 35B scores 113/120 (Sonnet-tier) on 16GB VRAM. The 9B scores 105/120 on 8GB VRAM.
Benchmark Results
40-prompt agentic evaluation across 8 categories. Scored 0-3 per prompt.
Wrench 35B — Category Breakdown
Wrench 9B — Category Breakdown
vs. Frontier Models
| Model | Score |
|---|---|
| Claude SonnetFrontier | ~114/120 |
| Wrench 35BClank Labs | 113/120 |
| GPT-4oFrontier | ~110/120 |
| Wrench 9BClank Labs | 105/120 |
| Base Qwen 3.5 35BBase | ~60/120 |
Built Different
Purpose-Built for Agents
Fine-tuned specifically for tool calling, multi-step task chains, and error recovery. Not a general chatbot — a coding agent.
Two Sizes
35B MoE (3B active, 16GB VRAM) for maximum capability. 9B dense (~5GB GGUF, 8GB VRAM) for lighter hardware.
Safe by Design
Trained to warn before destructive actions, ask for confirmation, and never hallucinate tool calls that don't exist.
Proven Performance
35B scores 113/120 (Sonnet-tier). 9B scores 105/120 (87.5%). On hardware you own, for free.
Ollama + llama.cpp
Standard GGUF format. Works with Ollama, llama.cpp, vLLM, LM Studio, or any OpenAI-compatible server.
Built for Clank
Drop-in model for the Clank Gateway. Set it as your primary model and go — multi-channel, multi-agent, full tool suite.
Quick Start
Option A: Ollama (recommended)
# Download the GGUF + Modelfile from HuggingFace, then:
ollama create wrench -f Modelfile
ollama run wrench
# For the 9B model:
ollama create wrench-9b -f Modelfile
ollama run wrench-9b
# Or use with Clank:
npm install -g @clanklabs/clank
clank setup
# Set primary model to "ollama/wrench" or "ollama/wrench-9b" in config
Option B: llama.cpp
# 35B model:
./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768
# 9B model:
./llama-server -m wrench-9B-Q4_K_M.gguf --jinja -ngl 100 -fa on --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 8192
# Serves an OpenAI-compatible API on port 8080
# Point any app at http://localhost:8080/v1
Model Details
Wrench 35B
| Base Model | Qwen3.5-35B-A3B |
| Architecture | MoE — 35B total, 3B active |
| Fine-Tune | LoRA (rank 64, alpha 128) |
| Training Data | 1,147 examples, 15 categories |
| Quantization | Q4_K_M GGUF (~20GB) |
| Context Window | 8,192 tokens |
| Min GPU | 16GB VRAM |
| Benchmark | 113/120 (94%) |
| License | Apache 2.0 |
Wrench 9B
| Base Model | Qwen3.5-9B |
| Architecture | Dense — 9B parameters |
| Fine-Tune | LoRA (rank 64, alpha 128) |
| Training Data | 1,147 examples, 15 categories |
| Quantization | Q4_K_M GGUF (~5GB) |
| Context Window | 8,192 tokens |
| Min GPU | 8GB VRAM |
| Benchmark | 105/120 (87.5%) |
| License | Apache 2.0 |