Clank Labs Model

Wrench

Frontier-grade agentic AI that runs on your hardware, for free. No API keys, no monthly bills — just models built for tool calling, error recovery, and getting real work done. The 35B scores 118/120 (matches Claude Opus) on 16GB VRAM. The 9B scores 114/120 (matches Claude Sonnet) on 8GB VRAM.

35B on HuggingFace 9B on HuggingFace Use with Clank Training Data

Benchmark Results

40-prompt agentic evaluation across 8 categories. Scored 0-3 per prompt.

Wrench 35B — Category Breakdown

Basic Tool Use15/15

Multi-Step Tasks15/15

Error Recovery14/15

Response Quality15/15

System Prompt Following15/15

Planning & Reasoning15/15

Tool Format Correctness14/15

Safety & Restraint15/15

Total118/120 (98.3%)

Wrench 9B — Category Breakdown

Basic Tool Use15/15

Multi-Step Tasks14/15

Error Recovery14/15

Response Quality15/15

System Prompt Following14/15

Planning & Reasoning13/15

Tool Format Correctness15/15

Safety & Restraint14/15

Total114/120 (95%)

vs. Frontier Models

Model	Score
Claude OpusFrontier	~118/120
Wrench 35BClank Labs	118/120
Claude SonnetFrontier	~114/120
Wrench 9BClank Labs	114/120
GPT-4oFrontier	~110/120
Base Qwen 3.5 35BBase	~60/120

Independent Validation

Wrench 35B on the Berkeley Function Calling Leaderboard (BFCL) — 1,390 test cases across 7 categories.

Non-live / AST category. An independent, standardized benchmark — not designed by us.

Category	Score	Accuracy
Simple (Python)	339/400	84.8%
Simple (Java)	44/100	44.0%
Simple (JavaScript)	28/50	56.0%
Multiple	169/200	84.5%
Parallel	170/200	85.0%
Parallel Multiple	165/200	82.5%
Irrelevance Detection	213/240	88.8%
Overall	1128/1390	82.0%

BFCL tests raw function-call syntax across Python, Java, and JavaScript — parallel invocations, multi-function calls, and irrelevance detection. A different axis than our agentic benchmark. Together, both benchmarks validate Wrench across structured function calling and real-world agent workflows.

Built Different

Purpose-Built for Agents

Fine-tuned specifically for tool calling, multi-step task chains, and error recovery. Not a general chatbot — a coding agent.

Two Sizes

35B MoE (3B active, 16GB VRAM) for maximum capability. 9B dense (~5GB GGUF, 8GB VRAM) for lighter hardware.

Safe by Design

Trained to warn before destructive actions, ask for confirmation, and never hallucinate tool calls that don't exist.

Proven Performance

35B scores 118/120 (Opus-tier) + 82% on BFCL. 9B scores 114/120 (95%). On hardware you own, for free.

Ollama + llama.cpp

Standard GGUF format. Works with Ollama, llama.cpp, vLLM, LM Studio, or any OpenAI-compatible server.

Built for Clank

Drop-in model for Clank. Set it as your primary model and go — multi-channel, multi-agent, full tool suite.

Quick Start

Option A: Ollama (recommended)

# Download the GGUF + Modelfile from HuggingFace, then:

ollama create wrench -f Modelfile

ollama run wrench

# For the 9B model:

ollama create wrench-9b -f Modelfile

ollama run wrench-9b

# Recommended: enable KV cache quantization for lower VRAM usage

OLLAMA_KV_CACHE_TYPE=q8_0 OLLAMA_FLASH_ATTENTION=1 ollama serve

# Or use with Clank:

npm install -g @clanklabs/clank

clank setup

# Set primary model to "ollama/wrench" or "ollama/wrench-9b" in config

Option B: llama.cpp

# 35B model:

./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768

# 9B model:

./llama-server -m wrench-9B-Q4_K_M.gguf --jinja -ngl 100 -fa on --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 8192

# Serves an OpenAI-compatible API on port 8080

# Point any app at http://localhost:8080/v1

Model Details

Wrench 35B

Base Model	Qwen3.5-35B-A3B
Architecture	MoE — 35B total, 3B active
Fine-Tune	LoRA (rank 64, alpha 128)
Training Data	1,252 examples, 15 categories
Quantization	Q4_K_M GGUF (~20GB)
Context Window	8,192 tokens
Min GPU	16GB VRAM
Clank Benchmark	118/120 (98.3%)
BFCL (non_live)	82.0% (1128/1390)
License	Apache 2.0

Wrench 9B

Base Model	Qwen3.5-9B
Architecture	Dense — 9B parameters
Fine-Tune	LoRA (rank 64, alpha 128)
Training Data	1,356 examples, 15 categories
Quantization	Q4_K_M GGUF (~5GB)
Context Window	8,192 tokens
Min GPU	8GB VRAM
Benchmark	114/120 (95%)
License	Apache 2.0