Clank Labs Model

Wrench

Purpose-built agentic AI models. Fine-tuned for tool calling, error recovery, and system prompt following. The 35B scores 113/120 (Sonnet-tier) on 16GB VRAM. The 9B scores 105/120 on 8GB VRAM.

Benchmark Results

40-prompt agentic evaluation across 8 categories. Scored 0-3 per prompt.

Wrench 35B — Category Breakdown

Basic Tool Use15/15
Multi-Step Tasks14/15
Error Recovery13/15
Response Quality15/15
System Prompt Following14/15
Planning & Reasoning14/15
Tool Format Correctness13/15
Safety & Restraint15/15
Total113/120 (94%)

Wrench 9B — Category Breakdown

Basic Tool Use11/15
Multi-Step Tasks13/15
Error Recovery14/15
Response Quality14/15
System Prompt Following12/15
Planning & Reasoning12/15
Tool Format Correctness15/15
Safety & Restraint14/15
Total105/120 (87.5%)

vs. Frontier Models

ModelScore
Claude SonnetFrontier~114/120
Wrench 35BClank Labs113/120
GPT-4oFrontier~110/120
Wrench 9BClank Labs105/120
Base Qwen 3.5 35BBase~60/120

Built Different

Purpose-Built for Agents

Fine-tuned specifically for tool calling, multi-step task chains, and error recovery. Not a general chatbot — a coding agent.

Two Sizes

35B MoE (3B active, 16GB VRAM) for maximum capability. 9B dense (~5GB GGUF, 8GB VRAM) for lighter hardware.

Safe by Design

Trained to warn before destructive actions, ask for confirmation, and never hallucinate tool calls that don't exist.

Proven Performance

35B scores 113/120 (Sonnet-tier). 9B scores 105/120 (87.5%). On hardware you own, for free.

Ollama + llama.cpp

Standard GGUF format. Works with Ollama, llama.cpp, vLLM, LM Studio, or any OpenAI-compatible server.

Built for Clank

Drop-in model for the Clank Gateway. Set it as your primary model and go — multi-channel, multi-agent, full tool suite.

Quick Start

Option A: Ollama (recommended)

# Download the GGUF + Modelfile from HuggingFace, then:

ollama create wrench -f Modelfile

ollama run wrench

# For the 9B model:

ollama create wrench-9b -f Modelfile

ollama run wrench-9b

# Or use with Clank:

npm install -g @clanklabs/clank

clank setup

# Set primary model to "ollama/wrench" or "ollama/wrench-9b" in config

Option B: llama.cpp

# 35B model:

./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768

# 9B model:

./llama-server -m wrench-9B-Q4_K_M.gguf --jinja -ngl 100 -fa on --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 8192

# Serves an OpenAI-compatible API on port 8080

# Point any app at http://localhost:8080/v1

Model Details

Wrench 35B

Base ModelQwen3.5-35B-A3B
ArchitectureMoE — 35B total, 3B active
Fine-TuneLoRA (rank 64, alpha 128)
Training Data1,147 examples, 15 categories
QuantizationQ4_K_M GGUF (~20GB)
Context Window8,192 tokens
Min GPU16GB VRAM
Benchmark113/120 (94%)
LicenseApache 2.0

Wrench 9B

Base ModelQwen3.5-9B
ArchitectureDense — 9B parameters
Fine-TuneLoRA (rank 64, alpha 128)
Training Data1,147 examples, 15 categories
QuantizationQ4_K_M GGUF (~5GB)
Context Window8,192 tokens
Min GPU8GB VRAM
Benchmark105/120 (87.5%)
LicenseApache 2.0