Anthropic family

Claude

Claude: Opus 4.8 (Thinking) ranks #2 of 186 on Quality Score. Compare Opus, Sonnet, Haiku, and Mythos by price, benchmarks, and workload.

Top in this family

Claude Opus 4.8 (Thinking) ranks #2 of 186 on overall quality (QS 108.6) at $5/$25 per 1M tokens.

Practical pick

Claude Sonnet 4.6 (Thinking) at $3/$15 per 1M tokens (rank #16 of 186).

Variants
4
License
Closed weights
Provider
Anthropic

★ Most teams should start here

Anthropic Claude Sonnet 4

Variant: 4.6 Thinking

The practical default for most teams. Carries the family's quality ceiling for everyday API workloads at a fraction of Opus pricing. Move up to Opus only for workloads where the cost gap is justified by visible quality wins.

Quality Score
96.7
Input
$3.00/1M
Output
$15.00/1M
Context
200K
License
Closed · API

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
Coding agents
Anthropic Claude Opus 4
4.8 Thinking
$5.00/1M / $25.00/1M
Strongest agentic coding and tool-use reliability in the family. Pick when coding throughput and multi-step planning are the binding constraint and the cost premium is acceptable.
General API workhorse
Anthropic Claude Sonnet 4
4.6 Thinking
$3.00/1M / $15.00/1M
Best quality-per-dollar in the family for chat, summarization, and tool-augmented assistants. Default unless your evals visibly improve under Opus.
Long-context RAG
Anthropic Claude Opus 4
4.8 Thinking
$5.00/1M / $25.00/1M
Strongest long-context recall in the family. Use when document scale and faithful retrieval over long inputs dominate.
Document AI / OCR
Anthropic Claude Sonnet 4
4.6 Thinking
$3.00/1M / $15.00/1M
Best practical fit for layout-aware document workloads in the family. Strong instruction-following and structured-output reliability without Opus pricing.
High-volume chat
Claude Haiku 4.5
Non-thinking
$1.00/1M / $5.00/1M
Cheapest current-generation tier in the family. Use for high-volume chat where per-token cost compounds. For an even cheaper option that's still served, see Claude 3.5 Haiku.

All variants

23 variants across 4 models (+ 1 cross-family for context). Sorted by quality score (descending).

VariantQSGPQAHLESWESWE-ProTerminalTauMCPAIMEIn $/MOut $/MContextReleasedLic.
Preview
Claude Mythos
118.9
#1/186
94.656.893.977.882.0
4.8 Thinking
Claude Opus 4
108.6
#2/186
93.649.888.669.282.2$5$25200KMay 22, 2025
4.7 Thinking
Claude Opus 4
107.8
#3/186
94.246.987.664.369.477.3$5$25200KMay 22, 2025
4.6 Thinking
Claude Opus 4
104.1
#6/186
91.340.080.853.465.491.959.595.6$5$251.0MMay 22, 2025
4.5 Thinking
Claude Opus 4
98.6
#13/186
87.030.880.959.388.962.392.8$5$25200KMay 22, 2025
4.6 Thinking
Claude Sonnet 4
96.7
#16/186
89.933.279.659.191.761.386.9$3$15200KMay 22, 2025
4.6 Non-thinking
Claude Opus 4
93.1
#23/186
19.0$5$25200KMay 22, 2025
4.5 Thinking
Claude Sonnet 4
86.1
#41/186
83.417.777.243.642.886.243.887.0$3$151.0MMay 22, 2025
4.1 Thinking
Claude Opus 4
83.1
#50/186
81.011.774.538.086.840.978.0$15$75200KMay 22, 2025
4.5 Non-thinking
Claude Opus 4
80.7
#63/186
14.245.9$5$25200KMay 22, 2025
4.0 Thinking
Claude Opus 4
80.7
#64/186
79.610.772.581.475.5$15$75200KMay 22, 2025
4.0 Non-thinking
Claude Opus 4
79.1
#73/186
74.96.772.581.833.9$15$75200KMay 22, 2025
Thinking
Claude Haiku 4.5
77.9
#79/186
73.09.773.383.240.280.7$1$5200KOct 15, 2025
4.0 Thinking
Claude Sonnet 4
75.6
#88/186
76.17.872.742.783.870.5$3$15200KMay 22, 2025
4.0 Non-thinking
Claude Sonnet 4
73.5
#99/186
70.05.572.775.033.1$3$15200KMay 22, 2025
4.5 Non-thinking
Claude Sonnet 4
73.0
#104/186
7.542.8$3$151.0MMay 22, 2025
4.1 Non-thinking
Claude Opus 4
70.4
#115/186
7.9$15$75200KMay 22, 2025
Non-thinking
Claude Haiku 4.5
66.3
#132/186
39.528.3$1$5200KOct 15, 2025
4.7 Non-thinking
Claude Opus 4
$5$25200KMay 22, 2025
V4 Pro Thinkingcross-family
DeepSeek V4
98.0
#15/186
90.137.780.655.473.6$0.435$0.871.0MApr 24, 2026
V4 Flash Thinkingcross-family
DeepSeek V4
92.0
#27/186
88.134.879.052.669.0$0.098$0.1971.0MApr 24, 2026
V4 Procross-family
DeepSeek V4
80.9
#61/186
72.97.773.652.169.4$0.435$0.871.0MApr 24, 2026
V4 Flashcross-family
DeepSeek V4
78.1
#78/186
71.28.173.749.164.0$0.098$0.1971.0MApr 24, 2026

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (202 of 461 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Opus 4 · 4.6 ThinkingArena Elo15021 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 2025 · aime_2025_python1001 / 7In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 2025 · code_exec1001 / 4In Quality Score
Anthropic Claude Opus 4 · 4.6 Thinkingτ²-bench · telecom99.31 / 28In Quality Score
Anthropic Claude Mythos · PreviewGPQA Diamond94.61 / 143In Quality Score
Anthropic Claude Mythos · PreviewSWE-bench Verified93.91 / 68In Quality Score
Anthropic Claude Opus 4 · 4.6 Thinkingτ²-bench · retail91.91 / 34In Quality Score
Anthropic Claude Opus 4 · 4.5 Thinkingτ²-bench · average91.61 / 30In Quality Score
Show all benchmark evidence (461 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Opus 4 · 4.6 ThinkingArena Elo15021 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 2025 · aime_2025_python1001 / 7In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 2025 · code_exec1001 / 4In Quality Score
Anthropic Claude Mythos · PreviewGPQA Diamond94.61 / 143In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingAIME 2025 · multiple901 / 2In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingGPQA Diamond · multiple83.81 / 2In Quality Score
Anthropic Claude Mythos · PreviewHumanity's Last Exam · tools64.71 / 38In Quality Score
Anthropic Claude Mythos · PreviewHumanity's Last Exam · hle56.81 / 90In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingHumanity's Last Exam · search_code53.11 / 6In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingArena Elo15002 / 158In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingAIME 202595.62 / 88In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingMMLU Pro89.52 / 86In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingAIME 2025 · multiple852 / 2In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingGPQA Diamond · multiple83.32 / 2In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingHumanity's Last Exam · tools57.92 / 38In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingHumanity's Last Exam · hle49.82 / 90In Quality Score
Anthropic Claude Opus 4 · 4.6 Non-thinkingArena Elo14983 / 158In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingGPQA Diamond94.23 / 143In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingHumanity's Last Exam · tools54.73 / 38In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingHumanity's Last Exam · search_code493 / 6In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingHumanity's Last Exam · hle46.93 / 90In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingHumanity's Last Exam · verified38.83 / 5In Quality Score
Anthropic Claude Opus 4 · 4.7 Non-thinkingArena Elo14944 / 158In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingGPQA Diamond93.64 / 143In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMLU Pro87.54 / 86In Quality Score
Claude Haiku 4.5 · ThinkingAIME 2025 · aime_2025_python96.35 / 7In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingLiveBench77.25 / 110In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingHumanity's Last Exam · tools53.15 / 38In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingHumanity's Last Exam · hle_text36.25 / 56In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingSimpleBench67.66 / 61In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingAIME 202592.87 / 88In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingMMLU Pro87.37 / 86In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingLiveBench76.97 / 110In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingSimpleBench64.87 / 61In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingHumanity's Last Exam · hle_text30.88 / 56In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingLiveBench76.39 / 110In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingHumanity's Last Exam · hle409 / 90In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingGPQA Diamond91.310 / 143In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingLiveBench76.010 / 110In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingSimpleBench6210 / 61In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 2025 · no_tools8711 / 15In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingSimpleBench61.711 / 61In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingMMLU Pro86.612 / 86In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingAIME 2025 · no_tools75.512 / 15In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingLiveBench75.512 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingAIME 2025 · no_tools70.514 / 15In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingSimpleBench6014 / 61In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingGPQA Diamond89.915 / 143In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingSimpleBench58.815 / 61In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingHumanity's Last Exam · tools4915 / 38In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingHumanity's Last Exam · hle33.215 / 90In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingArena Elo147316 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingLiveBench74.816 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingHumanity's Last Exam · hle_text19.816 / 56In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingLiveBench74.617 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingSimpleBench54.318 / 61In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingArena Elo147019 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingAIME 20258719 / 88In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingHumanity's Last Exam · hle30.819 / 90In Quality Score
Anthropic Claude Opus 4 · 4.6 Non-thinkingHumanity's Last Exam · hle_text19.419 / 56In Quality Score
Anthropic Claude Opus 4 · 4.5 Non-thinkingArena Elo146920 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingAIME 202586.920 / 88In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingMMLU Pro8522 / 86In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingGPQA Diamond8724 / 143In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingHumanity's Last Exam · tools43.425 / 38In Quality Score
Anthropic Claude Opus 4 · 4.5 Non-thinkingHumanity's Last Exam · hle_text13.925 / 56In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingSimpleBench45.529 / 61In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingHumanity's Last Exam · hle_text11.329 / 56In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Non-thinkingArena Elo145530 / 158In Quality Score
Claude Haiku 4.5 · ThinkingAIME 202580.730 / 88In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingHumanity's Last Exam · hle_text10.830 / 56In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingArena Elo145531 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingMMLU Pro83.732 / 86In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingAIME 20257832 / 88In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingHumanity's Last Exam · tools33.632 / 38In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingArena Elo144936 / 158In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingAIME 202575.536 / 88In Quality Score
Anthropic Claude Opus 4 · 4.1 Non-thinkingArena Elo144739 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Non-thinkingHumanity's Last Exam · hle_text7.740 / 56In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingGPQA Diamond83.441 / 143In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingAIME 202570.541 / 88In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingLiveBench68.241 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingHumanity's Last Exam · hle_text7.641 / 56In Quality Score
Anthropic Claude Opus 4 · 4.1 Non-thinkingHumanity's Last Exam · hle_text7.442 / 56In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingHumanity's Last Exam · hle_text7.143 / 56In Quality Score
Anthropic Claude Opus 4 · 4.6 Non-thinkingHumanity's Last Exam · hle1944 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingHumanity's Last Exam · hle_text5.846 / 56In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingHumanity's Last Exam · hle17.747 / 90In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingGPQA Diamond8151 / 143In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingLiveBench61.854 / 110In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingGPQA Diamond79.656 / 143In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingAIME 202533.956 / 88In Quality Score
Anthropic Claude Opus 4 · 4.5 Non-thinkingHumanity's Last Exam · hle14.257 / 90In Quality Score
Claude Haiku 4.5 · ThinkingLiveBench61.358 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingAIME 202533.158 / 88In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingArena Elo142459 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingLiveBench61.359 / 110In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingHumanity's Last Exam · hle11.759 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingGPQA Diamond76.163 / 143In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingHumanity's Last Exam · hle10.763 / 90In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingGPQA Diamond74.965 / 143In Quality Score
Claude Haiku 4.5 · ThinkingHumanity's Last Exam · hle9.766 / 90In Quality Score
Claude Haiku 4.5 · ThinkingGPQA Diamond7368 / 143In Quality Score
Anthropic Claude Opus 4 · 4.5 Non-thinkingLiveBench59.170 / 110In Quality Score
Anthropic Claude Opus 4 · 4.1 Non-thinkingLiveBench54.574 / 110In Quality Score
Anthropic Claude Opus 4 · 4.1 Non-thinkingHumanity's Last Exam · hle7.974 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Non-thinkingLiveBench53.775 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingHumanity's Last Exam · hle7.875 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingGPQA Diamond7076 / 143In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingArena Elo141277 / 158In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Non-thinkingHumanity's Last Exam · hle7.578 / 90In Quality Score
Claude Haiku 4.5 · Non-thinkingArena Elo141179 / 158In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingHumanity's Last Exam · hle6.782 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingHumanity's Last Exam · hle5.585 / 90In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingArena Elo139989 / 158In Quality Score
Claude Haiku 4.5 · Non-thinkingLiveBench45.392 / 110In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingArena Elo138998 / 158In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingGraphwalks · parents_256k99.31 / 4Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingMMLU92.91 / 33Tracked evidence
Anthropic Claude Opus 4 · 4.0 ThinkingMMMU · mmmu_l388.81 / 5Tracked evidence
Anthropic Claude Mythos · PreviewBrowseComp86.91 / 51Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingGraphwalks · bfs_256k85.91 / 4Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingLongform Writing79.81 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingGraphwalks · parents_1m721 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingFinanceAgent64.41 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVendingBench23838.72 / 4Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingGraphwalks · parents_256k93.62 / 4Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingMRCR · v2_128k84.92 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingBrowseComp · context_manage842 / 15Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingWMT24++79.72 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingGraphwalks · bfs_256k76.92 / 4Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingFinanceAgent63.32 / 15Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingFinanceAgent · v253.92 / 7Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingGraphwalks · bfs_1m41.22 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.0 ThinkingMATH 50098.23 / 55Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingHMMT Nov 202596.33 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingGlobal PIQA91.63 / 26Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMMLU90.13 / 38Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingMMMU · mmmu_l386.53 / 5Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingAceBench76.23 / 7Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingBFCL v375.23 / 49Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingFinanceAgent60.73 / 15Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingFrontierMath · tier1_343.83 / 5Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingMRCR · v2_average39.13 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingMRCR · v2_512k_1m32.23 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingFrontierMath · tier422.93 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingMATH 50098.24 / 55Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingMMLU91.54 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingGlobalPIQA90.14 / 4Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingHMMT Feb 2025 · python88.84 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingBrowseComp84.34 / 51Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMRCR · v2_128k844 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingAceBench75.64 / 7Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingBFCL v374.44 / 49Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingMRCR · v2_128k_256k59.24 / 4Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingSciCode524 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingFinanceAgent · v251.54 / 7Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingHealthBench44.24 / 5Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMathArenaApex1.64 / 8Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingAIME 202695.65 / 19Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingMMLU91.55 / 33Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingBrowseComp845 / 51Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMMU · mmmu_single77.85 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMMMU PRO · tools77.35 / 10Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingFinanceAgent · v2515 / 7Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingHealthBench · hard14.85 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMMLU91.16 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMMLU89.16 / 38Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingMMMU · mmmu_single77.16 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingMMMU PRO · tools75.66 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSciCode49.56 / 24Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingFACTS Benchmark Suite48.96 / 12Tracked evidence
Anthropic Claude Opus 4 · 4.0 ThinkingMRCR · v2_average16.16 / 6Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingGlobal PIQA90.17 / 26Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingHMMT Feb 202684.37 / 16Tracked evidence
Anthropic Claude Opus 4 · 4.0 ThinkingMMMU · mmmu_single76.57 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMLU90.88 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingBrowseComp · context_manage74.78 / 15Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMMU PRO · tools73.98 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingFinanceAgent55.98 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingSciCode478 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingHMMT Feb 202592.99 / 44Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingBrowseComp79.39 / 51Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingBrowseComp · context_manage67.89 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingFinanceAgent54.29 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMMU PRO · tools68.910 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingBrowseComp_zh62.410 / 20Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingBrowseComp74.711 / 51Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingMMMU · mmmu_single74.411 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingFinanceAgent50.911 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingSciCode44.711 / 24Tracked evidence
Claude Haiku 4.5 · ThinkingFACTS Benchmark Suite18.611 / 12Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingAIME 202693.312 / 19Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingHMMT Nov 202591.712 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingMMLU89.512 / 33Tracked evidence
Claude Haiku 4.5 · ThinkingMMMU · mmmu_single73.212 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingMRCR · v2_128k59.312 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingFinanceAgent44.513 / 15Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMAXIFE79.214 / 21Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingBrowseComp67.815 / 51Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingBrowseComp · context_manage43.915 / 15Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingSimpleQA29.316 / 40Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingMMLU89.317 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMLU89.118 / 33Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingIMO AnswerBench78.518 / 28Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMRCR · v2_128k47.118 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingMMMU PRO75.219 / 52Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingIFBench5819 / 28Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingBrowseComp_zh42.419 / 20Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingIFBench57.120 / 28Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingSciCode39.820 / 24Tracked evidence
Claude Haiku 4.5 · ThinkingMRCR · v2_128k35.320 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.0 ThinkingAIME 20247621 / 69Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingIFBench55.421 / 28Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingMMLU86.522 / 33Tracked evidence
Claude Haiku 4.5 · ThinkingMMMLU8322 / 38Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingIMO AnswerBench75.322 / 28Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingIFBench5322 / 28Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingAIME 202475.723 / 69Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingHMMT Feb 202574.623 / 44Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingMMMU PRO74.523 / 52Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingSimpleQA22.824 / 40Tracked evidence
Claude Haiku 4.5 · ThinkingMMLU8325 / 33Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMMMU PRO73.925 / 52Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingIMO AnswerBench65.925 / 28Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMMU PRO70.627 / 52Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingBrowseComp43.930 / 51Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingSimpleQA15.930 / 40Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMMU PRO63.434 / 52Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingAIME 202448.237 / 69Tracked evidence
Claude Haiku 4.5 · ThinkingMMMU PRO5838 / 52Tracked evidence
Claude Haiku 4.5 · ThinkingSimpleQA5.538 / 40Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingAIME 202443.441 / 69Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingBrowseComp18.841 / 51Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingHMMT Feb 202515.941 / 44Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingHMMT Feb 202515.942 / 44Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingBrowseComp14.742 / 51Tracked evidence

Coding

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Mythos · PreviewSWE-bench Verified93.91 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingSWE-bench Verified · multiple821 / 10In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingSWE-bench Verified · multilingual_single681 / 10In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingSWE-bench Verified · single_agentless531 / 7In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingGSO (Global Software Optimization) · opt_at_142.21 / 24In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingSWE-bench Verified88.62 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingSWE-bench Verified · multiple80.22 / 10In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingGSO (Global Software Optimization) · opt_at_137.32 / 24In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingSWE-bench Verified87.63 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingSWE-bench Verified · multiple80.23 / 10In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingSWE-bench Verified · single_agentless50.23 / 7In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingSWE-bench Verified80.94 / 68In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingSWE-bench Verified · multiple79.44 / 10In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingSWE-bench Verified80.85 / 68In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingSWE-bench Verified · multiple79.45 / 10In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingLiveCodeBench · pro70.75 / 5In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingSWE-bench Verified · multiple79.46 / 10In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingSWE-bench Verified · multilingual_single516 / 10In Quality Score
Anthropic Claude Opus 4 · 4.5 Non-thinkingGSO (Global Software Optimization) · opt_at_124.56 / 24In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingLiveCodeBench · v684.87 / 40In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingAider (Polyglot)727 / 45In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingLiveCodeBench · 2024_07_2025_0163.68 / 8In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingAider (Polyglot)70.710 / 45In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingLiveCodeBench · 2025_01_2025_05_single51.110 / 11In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingGSO (Global Software Optimization) · opt_at_112.710 / 24In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingSWE-bench Verified79.611 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingLiveCodeBench · 2025_01_2025_05_single48.911 / 11In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingLiveCodeBench · 2024_08_2025_0556.612 / 17In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingGSO (Global Software Optimization) · opt_at_14.913 / 24In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingGSO (Global Software Optimization) · opt_at_14.914 / 24In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingAider (Polyglot)61.316 / 45In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingSWE-bench Verified77.218 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingAider (Polyglot)56.420 / 45In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingLiveCodeBench · v66425 / 40In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingLiveCodeBench56.625 / 69In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingLiveCodeBench55.926 / 69In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingSWE-bench Verified74.527 / 68In Quality Score
Claude Haiku 4.5 · ThinkingLiveCodeBench53.231 / 69In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingLiveCodeBench · v648.531 / 40In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingLiveCodeBench · v647.432 / 40In Quality Score
Claude Haiku 4.5 · ThinkingSWE-bench Verified73.333 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingLiveCodeBench47.134 / 69In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingLiveCodeBench46.935 / 69In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingSWE-bench Verified72.736 / 68In Quality Score
Anthropic Claude Sonnet 4 · 4.0 ThinkingSWE-bench Verified72.737 / 68In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingSWE-bench Verified72.538 / 68In Quality Score
Anthropic Claude Opus 4 · 4.0 ThinkingSWE-bench Verified72.539 / 68In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingSWE-bench Multilingual84.41 / 18Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingSWE-bench Multilingual80.52 / 18Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSecCodeBench68.62 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSWE-bench Multilingual77.53 / 18Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingOJ-Bench · cpp30.45 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.0 Non-thinkingOJ-Bench19.615 / 19Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingOJ-Bench15.318 / 19Tracked evidence

Agentic

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Opus 4 · 4.6 Thinkingτ²-bench · telecom99.31 / 28In Quality Score
Anthropic Claude Opus 4 · 4.6 Thinkingτ²-bench · retail91.91 / 34In Quality Score
Anthropic Claude Opus 4 · 4.5 Thinkingτ²-bench · average91.61 / 30In Quality Score
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ²-bench · retail91.72 / 34In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingMCP Atlas82.22 / 33In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingMCP Atlas · public_set73.82 / 13In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ²-bench · airline702 / 29In Quality Score
Claude Haiku 4.5 · Thinkingτ²-bench · airline63.63 / 29In Quality Score
Anthropic Claude Opus 4 · 4.5 Thinkingτ²-bench · telecom98.24 / 28In Quality Score
Anthropic Claude Opus 4 · 4.5 Thinkingτ²-bench · retail88.94 / 34In Quality Score
Anthropic Claude Opus 4 · 4.7 ThinkingMCP Atlas77.34 / 33In Quality Score
Anthropic Claude Opus 4 · 4.1 Thinkingτ²-bench · airline634 / 29In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ²-bench · telecom985 / 28In Quality Score
Anthropic Claude Opus 4 · 4.1 Thinkingτ²-bench · retail86.85 / 34In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Thinkingτ²-bench · airline635 / 29In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ²-bench · average87.26 / 30In Quality Score
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ²-bench · retail86.26 / 34In Quality Score
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ²-bench · telecom97.98 / 28In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Thinkingτ²-bench · retail83.88 / 34In Quality Score
Claude Haiku 4.5 · Thinkingτ²-bench · retail83.29 / 34In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingMCP Atlas · public_set65.29 / 13In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingτ²-bench · airline6010 / 29In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingτ²-bench · retail81.811 / 34In Quality Score
Anthropic Claude Opus 4 · 4.0 Thinkingτ²-bench · retail81.412 / 34In Quality Score
Anthropic Claude Opus 4 · 4.0 Thinkingτ²-bench · airline59.612 / 29In Quality Score
Claude Haiku 4.5 · Thinkingτ²-bench · telecom8315 / 28In Quality Score
Anthropic Claude Opus 4 · 4.5 ThinkingMCP Atlas62.315 / 33In Quality Score
Anthropic Claude Opus 4 · 4.1 Thinkingτ²-bench · telecom71.517 / 28In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingτ²-bench · retail7518 / 34In Quality Score
Anthropic Claude Sonnet 4 · 4.6 ThinkingMCP Atlas61.318 / 33In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingτ²-bench · airline55.518 / 29In Quality Score
Anthropic Claude Opus 4 · 4.6 ThinkingMCP Atlas59.520 / 33In Quality Score
Anthropic Claude Opus 4 · 4.0 Non-thinkingτ²-bench · telecom5722 / 28In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Thinkingτ²-bench · telecom49.623 / 28In Quality Score
Anthropic Claude Sonnet 4 · 4.0 Non-thinkingτ²-bench · telecom45.224 / 28In Quality Score
Anthropic Claude Sonnet 4 · 4.5 ThinkingMCP Atlas43.829 / 33In Quality Score
Anthropic Claude Opus 4 · 4.1 ThinkingMCP Atlas40.930 / 33In Quality Score
Claude Haiku 4.5 · ThinkingMCP Atlas40.231 / 33In Quality Score
Anthropic Claude Opus 4 · 4.8 ThinkingGDPVal-AA18901 / 17Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingOSWorld · verified83.41 / 27Tracked evidence
Anthropic Claude Mythos · PreviewCyberGym83.11 / 12Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ³-Bench · airline831 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingBFCL v477.51 / 18Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingOSWorld72.71 / 10Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ³-Bench · banking28.41 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingAutomation Bench15.51 / 5Tracked evidence
Anthropic Claude Mythos · PreviewOSWorld · verified79.62 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingOSWorld72.52 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.6 Thinkingτ³-Bench72.42 / 10Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ³-Bench · banking22.42 / 6Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVendingBench · v238393 / 7Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingGDPVal-AA17533 / 17Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingGDPVal80.33 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingWideSearch76.43 / 13Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingOSWorld66.33 / 10Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingSeal-053.43 / 16Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingFinSearchComp-T3443 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingDeepPlanning33.93 / 16Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ³-Bench · retail75.94 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingCyberGym73.84 / 12Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingDeepSearchQA73.74 / 7Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingOSWorld61.44 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingAutomation Bench9.94 / 5Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ³-Bench · telecom84.95 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingOSWorld · verified785 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingCyberGym73.15 / 12Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ³-Bench · airline725 / 6Tracked evidence
Claude Haiku 4.5 · ThinkingOSWorld50.75 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSeal-047.75 / 16Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMCPMark42.35 / 8Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingGDPVal-AA16336 / 17Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 Thinkingτ³-Bench · retail72.46 / 6Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 Thinkingτ³-Bench · telecom70.46 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingGDPVal-AA16067 / 17Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingCyberGym50.67 / 12Tracked evidence
Anthropic Claude Opus 4 · 4.1 ThinkingOSWorld44.47 / 10Tracked evidence
Anthropic Claude Sonnet 4 · 4.0 ThinkingOSWorld42.28 / 10Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingOSWorld · verified72.710 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingToolathlon47.210 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingGDPVal-AA141611 / 17Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingOSWorld · verified72.511 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingOSWorld · verified66.313 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingGDPVal-AA127614 / 17Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingToolathlon43.514 / 31Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingOSWorld · verified61.417 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingToolathlon38.919 / 31Tracked evidence

Multimodal

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Mythos · PreviewCharXiv Reasoning · tools93.21 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingScreenSpot-Pro · tools87.91 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingScreenSpot-Pro · no_tools82.31 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingChartQAPro · tools72.31 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.8 ThinkingChartQAPro · no_tools69.41 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingCharXiv Reasoning · tools912 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingScreenSpot-Pro · tools87.62 / 2Tracked evidence
Anthropic Claude Mythos · PreviewCharXiv Reasoning86.12 / 48Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingScreenSpot-Pro · no_tools79.52 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingChartQAPro · tools69.82 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingChartQAPro · no_tools67.62 / 2Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingCharXiv Reasoning · tools84.73 / 3Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingWorldVQA36.83 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMVU77.34 / 20Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMotionBench60.34 / 4Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMedXpertQA · text52.14 / 5Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingZEROBench114 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingLingoQA78.86 / 16Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSimpleVQA65.76 / 29Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVLMs Are Blind85.57 / 18Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingVideo-MMMU84.47 / 28Tracked evidence
Anthropic Claude Opus 4 · 4.7 ThinkingCharXiv Reasoning82.17 / 48Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingSimpleVQA62.27 / 29Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingVideoMME · without_sub81.49 / 21Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingSLAKE76.49 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMVU70.610 / 20Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingMedXpertQA · mm64.810 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingZEROBench · sub28.410 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMedXpertQA · mm63.611 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMBench · en_dev_v1_189.212 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMathVision74.312 / 17Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingSLAKE73.612 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingAI2D · test87.713 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingSimpleVQA57.613 / 29Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingZEROBench · sub26.313 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingZEROBench413 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingCountBench90.614 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMBench · en_dev_v1_188.314 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMLVU · mavg81.714 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVideoMME · with_sub81.114 / 22Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingDynaMath79.714 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVideoMME · without_sub75.314 / 21Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMathVision71.114 / 17Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingLVBench57.314 / 18Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingCountBench9015 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingAI2D · test8715 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingDynaMath78.815 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingRealWorldQA7715 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMVBench67.215 / 18Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingScreenSpot-Pro57.715 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingZEROBench315 / 27Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingVideoMME · with_sub77.616 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMMStar73.816 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMedXpertQA · mm5416 / 31Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingERQA51.616 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingLingoQA12.816 / 16Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMathVista · mini8017 / 36Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingEmbSpatialBench75.717 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingMMStar73.217 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingBabyVision18.617 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMathVista · mini79.818 / 36Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingERQA46.818 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingVideo-MMMU77.819 / 28Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingMLVU · mavg72.819 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingEmbSpatialBench71.819 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingScreenSpot-Pro45.719 / 24Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingRealWorldQA70.321 / 24Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingHallusionBench64.121 / 33Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingERQA4521 / 27Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingScreenSpot-Pro36.221 / 24Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingRefSpatialBench2.221 / 21Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingV*6722 / 23Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingBabyVision14.222 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingV*58.623 / 23Tracked evidence
Anthropic Claude Sonnet 4 · 4.6 ThinkingCharXiv Reasoning72.424 / 48Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingHallusionBench59.926 / 33Tracked evidence
Anthropic Claude Opus 4 · 4.6 ThinkingCharXiv Reasoning69.129 / 48Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingCharXiv Reasoning68.530 / 48Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingCharXiv Reasoning68.531 / 48Tracked evidence
Claude Haiku 4.5 · ThinkingCharXiv Reasoning61.735 / 48Tracked evidence

Document/OCR

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude Opus 4 · 4.5 ThinkingMMLongBench-Doc61.91 / 22Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingOmniDocBench · v1_50.12 / 6Tracked evidence
Anthropic Claude Opus 4 · 4.5 ThinkingOCRBench85.813 / 35Tracked evidence
Anthropic Claude Sonnet 4 · 4.5 ThinkingOCRBench76.627 / 35Tracked evidence

Where this family sits in the market

Sonnet 4 sits at the family's price-quality sweet spot. Haiku 4.5 takes the cost-efficiency frontier for high-volume workloads.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

The Claude family

Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.

Closed · API only (4)

  • Anthropic Claude Opus 411 variants
  • Anthropic Claude Sonnet 45 variants
  • Claude Haiku 4.52 variants
  • Anthropic Claude Mythos1 variant

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Caveats

What this page does not tell you, listed honestly.

  • No tracked API pricing for: Anthropic Claude Mythos. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
  • Context window not declared for: Anthropic Claude Mythos.
  • Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

Why this family matters

Claude 4 is Anthropic's current generation: Opus (flagship), Sonnet (workhorse), Haiku (cost tier), and the Mythos preview at the top of the roadmap. The structurally interesting pattern across the family is the Opus pricing reset between 4.0/4.1 and 4.5: input dropped from $15 to $5 per million, output from $75 to $25. That is a 3x cost cut on the same brand-name SKU, and it materially changes the "should we even consider Opus?" conversation for workloads that were previously priced out.

The Sonnet 4.5 line ships a 1M-token context window at the same headline

$3

input /

$15

output as the 200K variants. As with the Opus reset, "long context" stops being a premium SKU and becomes a free axis on the parts of the family where it ships.

Which variant to start with

Default to anthropic-claude-sonnet-4. At $3 input / $15 output per million it is the family's price-quality sweet spot, and the 4.6-thinking variant lands at Quality Score 96.7 (#16 of 186 models we track), inside the cluster where the closed-flagship reasoning conversation actually lives. For most teams shipping API-backed product features, this is the practical default.

Step up to anthropic-claude-opus-4 when you can name the workload that justifies the cost gap. With the 4.5+ reset, the Opus premium over Sonnet is roughly $5 vs $3 input and $25 vs $15 output per million. That is far narrower than the pre-reset gap, but still material at production volume. Reach for Opus when agentic coding (SWE-Bench Verified) or hardest-tier reasoning (GPQA Diamond, AIME) is the binding constraint; 4.7-thinking is rank 2 / rank 4 / rank 3 on those respective benchmarks in our index, which is the ceiling signature you are paying for.

When to deviate:

  • Coding agents: use anthropic-claude-opus-4 (4.5-thinking or newer). Opus 4.7-thinking is rank #3 of 68 on SWE-Bench Verified in our index, and the price reset makes the cost gap to Sonnet defensible for agentic-coding loops where reliability compounds.
  • High-volume chat at scale: drop to anthropic-claude-haiku-45 ($1 / $5 per million). The score gap to Sonnet is real (Haiku 4.5 thinking at QS 77.9 vs Sonnet 4.6 thinking at QS 96.7), but for repetitive low-stakes turns the per-token cost cut dominates the unit economics.
  • Long-context RAG: use anthropic-claude-sonnet-4 (4.5 variants). The 1M context window at Sonnet pricing is the cheap long-context play in the family. Opus 4.6-thinking also reaches 1M context but at the higher Opus price point; Sonnet 4.5-thinking at the same window is the procurement-friendly choice unless you have measured an Opus-specific quality gain on your eval.
  • You are tracking the frontier: the Mythos preview lands at Quality Score 118.9 (#1 of 186) with first-place finishes on GPQA Diamond and SWE-Bench Verified in the data we have. It has no public pricing or general availability, so it is a roadmap signal, not an option to ship today. Treat its scores as Anthropic's benchmark publication, not as independently verified.

Where the data is weak

We aggregate benchmark scores from multiple sources but coverage and naming across this family deserve a careful read. Specifically:

  • Opus has many minor versions (4.0, 4.1, 4.5, 4.6, 4.7, plus Thinking modes for each). Within-family scores vary substantially across these (e.g. Opus 4.1 non-thinking at QS 70.4 vs Opus 4.7-thinking at QS 107.8). When the article quotes a number, it is for the specific minor version named; do not collapse the line to a single Opus score.
  • The 4.0/4.1 generation and the 4.5+ generation are not drop-in. The 4.5+ reset changed pricing AND scores; treat pre-4.5 numbers as an older line that happens to share the brand prefix.
  • Mythos data is Anthropic's announcement set. Independent reproductions had not landed in our index at last verification. Pricing and context window are unset (preview, no public SKU).
  • Opus 4.6-thinking has 1M context; Opus 4.0/4.1/4.5/4.7 do not. This is unintuitive (newer is not strictly larger context) and is worth checking against the variant table before committing.
  • Pricing on this page is the published API list price. Volume agreements and the various Anthropic-direct vs Bedrock vs Vertex paths can change the unit economics; list price is a calibration anchor, not the cost ceiling.

If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against Anthropic's own docs and your cloud-provider's Bedrock/Vertex pricing before you commit.

When to reach for which alternative

  • Open-weights deployment is a requirement: Claude is API-only. The conversation moves to open-weights families (Qwen3, DeepSeek, Llama). Compare on the specific benchmark that matches your workload; the cross-family comparison views in our index are designed for this.
  • You need the cheapest competent reasoning at API scale: DeepSeek V4 Pro Thinking lands at QS 98.0 with $0.435 / $0.87 pricing in our index, which is roughly an order of magnitude cheaper than Sonnet 4.6 thinking at comparable quality-score position. Run a side-by-side on your eval before committing to either.
  • Previous-generation Claude is already in production: the Sonnet 3.5 / Haiku 3.5 line is on the sibling claude-3-5 surface in our index. Anthropic still serves them, and for some chat-default workloads the migration cost to 4 may not be earned by the score delta. Run the comparison before assuming the upgrade.

Sources worth reading

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Claude update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →