Claude Opus 4.6 (Thinking) vs Gemini 3.1 Pro vs Muse Spark (Thinking)

Side-by-side benchmark scores, pricing, and specifications

Claude Opus 4.6 (Thinking)

Gemini 3.1 Pro

Muse Spark (Thinking)

Specifications

Specification	Claude Opus 4.6 (Thinking)	Gemini 3.1 Pro	Muse Spark (Thinking)
Provider	Anthropic	Google	Meta
Variant	4.6 Thinking	3.1	Thinking
Input price	$5.00/1M	$2.00/1M	—
Output price	$25.00/1M	$12.00/1M	—
Context window	1.0M	—	—

Benchmark	Claude Opus 4.6 (Thinking)	Gemini 3.1 Pro	Muse Spark (Thinking)
CompositeQuality Score	102.9%#7	102.6%#8	97.8%#14
Human preferenceArena ELO	1,504#1	1,487#5	1,487#6
General reasoningLiveBench	76.3%#9	79.9%#4	—
Scientific knowledgeGPQA Diamond	91.3%#10	94.3%#2	89.5%#16
Academic reasoningHLE	40.0%#9	44.4%#4	42.8%#6
Commonsense reasoningSimpleBench	67.6%#6	79.6%#1	—
MathematicsAIME 2025	95.6%#2	—	—
Graduate scienceGSO	37.3%#2	21.6%#7	—
Agentic codingSWE-Bench Verified	80.8%#5	80.6%#7	77.4%#16
Agentic tool useTau-Bench	91.9%#1	90.8%#3	—
Agentic terminalTerminal-Bench	65.4%#5	68.5%#4	—
Visual reasoningARC-AGI-2	68.8%#7	77.1%#3	42.5%#13

Scores represent the best available variant for each model. Higher is better unless otherwise noted. Bars show relative performance within each benchmark.