I help you optimize AI for cost, quality, and complexity.

VoidSource is my independent AI systems lab. I benchmark models, APIs, self-hosted tools, rules, and workflow designs to find where simpler systems are enough, where powerful models are worth it, and where AI adds no value at all.
Price / 1M tokensbest available pricing
Claude Opus 4.6$5/$25PremiumClaude Sonnet 4.6$3/$15Gemini 3.1 Pro$2/$12Gemini 3.5 Flash$2/$9Grok 4.20$1/$3Claude Haiku 4.5$1/$5Qwen 3.7$0.78/$4Gemini 3.0 Flash$0.50/$3K2.5$0.40/$2GPT-5.1$0.25/$2v3-0324$0.20/$0.77V4$0.10/$0.20Best valuenon-thinking-2507$0.07/$0.10Microsoft Phi-4$0.07/$0.14Qwen 3 30b A3b$0.05/$0.19Qwen 3.5 9B$0.04/$0.15Claude Opus 4.6$5/$25PremiumClaude Sonnet 4.6$3/$15Gemini 3.1 Pro$2/$12Gemini 3.5 Flash$2/$9Grok 4.20$1/$3Claude Haiku 4.5$1/$5Qwen 3.7$0.78/$4Gemini 3.0 Flash$0.50/$3K2.5$0.40/$2GPT-5.1$0.25/$2v3-0324$0.20/$0.77V4$0.10/$0.20Best valuenon-thinking-2507$0.07/$0.10Microsoft Phi-4$0.07/$0.14Qwen 3 30b A3b$0.05/$0.19Qwen 3.5 9B$0.04/$0.15
113
models tracked
15
benchmarks
81
with live pricing
16
providers

Pricing and benchmark numbers come from public sources and our own runs. What we measure ourselves vs. aggregate.

Judgment

The right tool depends on the job.

The goal is not to use the smallest model or the biggest one by default. The goal is to spend where it changes the outcome.

Start simple

Test the lightweight route first.

Rules, cached calls, smaller models, or local tools often handle the boring majority. Upgrade only where the lighter route fails.

Measure

Optimize cost per successful outcome.

The useful metric is not token price in isolation. It is what you pay for a correct extraction, accepted answer, resolved ticket, or clean handoff.

Route

Escalate ambiguous cases.

Use low-cost deterministic paths for clear work, then route uncertainty to stronger models, validation, or human review where it changes the result.

Constrain

Respect privacy and operations.

Self-hosting, cloud APIs, and hybrid pipelines each have a cost. The right answer depends on data sensitivity, team speed, and maintenance reality.

Signal notes

Get the AI Tradeoff Notes

Practical notes on model choice, API costs, self-hosting, benchmarks, and when lighter systems are good enough. Sent only when there is something useful to say. No hype, no daily news sludge.

Live data

Cost vs. quality across 15 benchmarks

Pick a benchmark below. Models on the dashed line are Pareto-optimal: no other model offers better performance for less money.

Head-to-head

Compare models side by side

Pick any models you're evaluating and compare benchmarks, pricing, and specs in one view.

SpecAnthropicClaude Opus 4.6 (Thinking)MetaMuse Spark (Thinking)GoogleGemini 3.1 Pro
Arena ELO1,5031,4891,488
Input price$5.00/1M$2.00/1M
Context1.0M

Work with me

Bring me a messy AI system decision.

The benchmarks and pricing above are the evidence layer. The actual product is judgment: knowing when to use the powerful model, when a lighter system is enough, when to use rules, when to self-host, and when not to use AI at all. I can review it, audit it, benchmark the options, prototype the better route, or build it with you.

See how to work with me

Decision Report

An async, evidence-backed recommendation for one workflow question: cloud API or self-hosted, small model or frontier, regex or LLM.

Workflow Audit

I map your AI workflow, find where you overpay or lose quality, and recommend the reliable route across models, rules, validation, and routing.

Prototype & Build

A runnable proof of the recommended path — eval table, cost-quality comparison — and implementation support when the work compounds into reusable machinery.

First proof vertical: document & OCR workflows — measurable, with real infrastructure behind it. See services and audit options.