Claude Opus 4.6 (Thinking) vs Gemini 3.1 Pro vs Muse Spark (Thinking)

Side-by-side benchmark scores, pricing, and specifications

Anthropic logo
Claude Opus 4.6 (Thinking)
Google logo
Gemini 3.1 Pro
Meta logo
Muse Spark (Thinking)

Specifications

Specification
Anthropic logo
Claude Opus 4.6 (Thinking)
Google logo
Gemini 3.1 Pro
Meta logo
Muse Spark (Thinking)
ProviderAnthropicGoogleMeta
Variant4.6 Thinking3.1Thinking
Input price$5.00/1M$2.00/1M
Output price$25.00/1M$12.00/1M
Context window1.0M
BenchmarkComparisonClaude Opus 4.6 (Thinking)Gemini 3.1 ProMuse Spark (Thinking)
CompositeQuality Score
102.9%#7
102.6%#8
97.8%#14
Human preferenceArena ELO
1,504#1
1,487#5
1,487#6
General reasoningLiveBench
76.3%#9
79.9%#4
Scientific knowledgeGPQA Diamond
91.3%#10
94.3%#2
89.5%#16
Academic reasoningHLE
40.0%#9
44.4%#4
42.8%#6
Commonsense reasoningSimpleBench
67.6%#6
79.6%#1
MathematicsAIME 2025
95.6%#2
Graduate scienceGSO
37.3%#2
21.6%#7
Agentic codingSWE-Bench Verified
80.8%#5
80.6%#7
77.4%#16
Agentic tool useTau-Bench
91.9%#1
90.8%#3
Agentic terminalTerminal-Bench
65.4%#5
68.5%#4
Visual reasoningARC-AGI-2
68.8%#7
77.1%#3
42.5%#13

Scores represent the best available variant for each model. Higher is better unless otherwise noted. Bars show relative performance within each benchmark.