This is a previous-generation family. Most teams should look at Claude: Opus 4.8 (Thinking), Opus, Sonnet, Haiku Compared instead.

The variants on this page still work and are still listed, but pricing, capabilities, and benchmarks below describe the older generation. Use this page for migration planning, not as a starting point.

Anthropic family

Claude 3.5

Claude 3.5 Sonnet still ships at $3/$15 per 1M, the same price as Sonnet 4. When the cost-equal Claude 4 tier wins, when 3.5 still earns its slot.

Top in this family

Claude 3.7 Sonnet (Thinking) ranks #96 of 186 on overall quality (QS 74.2) at $3/$15 per 1M tokens.

Variants
2
License
Closed weights
Provider
Anthropic

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
General API workhorse
Anthropic Claude 3 Sonnet
3.5 (October 2024)
$3.00/1M / $15.00/1M
The previous-generation default. Choose when Sonnet 4's quality lift does not justify the price premium on your evals.
High-volume chat
Anthropic Claude 3.5 Haiku
Non-thinking
$1.00/1M / $5.00/1M
Cheapest served Claude tier. Use for high-volume chat where per-token cost compounds and Haiku 4.5's quality lift is not measurable on your workload.

All variants

28 variants across 2 models (+ 4 cross-family for context). Sorted by quality score (descending).

VariantQSGPQAHLESWESWE-ProTerminalTauMCPAIMEIn $/MOut $/MContextReleasedLic.
3.7 ThinkingPrevious
Claude 3 Sonnet
74.2
#96/186
78.28.062.381.2$3$15200KFeb 24, 2025
3.7Previous
Claude 3 Sonnet
69.2
#120/186
68.0$3$15200KFeb 24, 2025
3.5 (October 2024)Previous
Claude 3 Sonnet
64.5
#138/186
65.04.1$3$15200KFeb 24, 2025
3.5 (June 2024)Previous
Claude 3 Sonnet
63.2
#143/186
$3$15200KFeb 24, 2025
Non-thinkingPrevious
Claude 3.5 Haiku
58.6
#166/186
$1$5200KNov 4, 2024
3.0Previous
Claude 3 Sonnet
$3$15200KFeb 24, 2025
4.8 Thinkingcross-family
Anthropic Claude Opus 4
108.6
#2/186
93.649.888.669.282.2$5$25200KMay 22, 2025
4.7 Thinkingcross-family
Anthropic Claude Opus 4
107.8
#3/186
94.246.987.664.369.477.3$5$25200KMay 22, 2025
4.6 Thinkingcross-family
Anthropic Claude Opus 4
104.1
#6/186
91.340.080.853.465.491.959.595.6$5$251.0MMay 22, 2025
4.5 Thinkingcross-family
Anthropic Claude Opus 4
98.6
#13/186
87.030.880.959.388.962.392.8$5$25200KMay 22, 2025
V4 Pro Thinkingcross-family
DeepSeek V4
98.0
#15/186
90.137.780.655.473.6$0.435$0.871.0MApr 24, 2026
4.6 Thinkingcross-family
Anthropic Claude Sonnet 4
96.7
#16/186
89.933.279.659.191.761.386.9$3$15200KMay 22, 2025
4.6 Non-thinkingcross-family
Anthropic Claude Opus 4
93.1
#23/186
19.0$5$25200KMay 22, 2025
V4 Flash Thinkingcross-family
DeepSeek V4
92.0
#27/186
88.134.879.052.669.0$0.098$0.1971.0MApr 24, 2026
4.5 Thinkingcross-family
Anthropic Claude Sonnet 4
86.1
#41/186
83.417.777.243.642.886.243.887.0$3$151.0MMay 22, 2025
4.1 Thinkingcross-family
Anthropic Claude Opus 4
83.1
#50/186
81.011.774.538.086.840.978.0$15$75200KMay 22, 2025
V4 Procross-family
DeepSeek V4
80.9
#61/186
72.97.773.652.169.4$0.435$0.871.0MApr 24, 2026
4.5 Non-thinkingcross-family
Anthropic Claude Opus 4
80.7
#63/186
14.245.9$5$25200KMay 22, 2025
4.0 Thinkingcross-family
Anthropic Claude Opus 4
80.7
#64/186
79.610.772.581.475.5$15$75200KMay 22, 2025
4.0 Non-thinkingcross-family
Anthropic Claude Opus 4
79.1
#73/186
74.96.772.581.833.9$15$75200KMay 22, 2025
V4 Flashcross-family
DeepSeek V4
78.1
#78/186
71.28.173.749.164.0$0.098$0.1971.0MApr 24, 2026
Thinkingcross-family
Claude Haiku 4.5
77.9
#79/186
73.09.773.383.240.280.7$1$5200KOct 15, 2025
4.0 Thinkingcross-family
Anthropic Claude Sonnet 4
75.6
#88/186
76.17.872.742.783.870.5$3$15200KMay 22, 2025
4.0 Non-thinkingcross-family
Anthropic Claude Sonnet 4
73.5
#99/186
70.05.572.775.033.1$3$15200KMay 22, 2025
4.5 Non-thinkingcross-family
Anthropic Claude Sonnet 4
73.0
#104/186
7.542.8$3$151.0MMay 22, 2025
4.1 Non-thinkingcross-family
Anthropic Claude Opus 4
70.4
#115/186
7.9$15$75200KMay 22, 2025
Non-thinkingcross-family
Claude Haiku 4.5
66.3
#132/186
39.528.3$1$5200KOct 15, 2025
4.7 Non-thinkingcross-family
Anthropic Claude Opus 4
$5$25200KMay 22, 2025

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (33 of 44 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3 Sonnet · 3.5 (October 2024)GSO (Global Software Optimization) · opt_at_1015.71 / 2In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinkingMMLU Pro · 5_shot_cot653 / 4In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinkingGPQA Diamond · 5_shot_cot41.63 / 4In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingSWE-bench Verified · multiple70.38 / 10In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingAider (Polyglot)64.912 / 45In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinkingτ²-bench · retail81.213 / 34In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinkingτ²-bench · airline58.413 / 29In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingAIME 2025 · no_tools54.815 / 15In Quality Score
Show all benchmark evidence (44 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3.5 Haiku · Non-thinkingMMLU Pro · 5_shot_cot653 / 4In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinkingGPQA Diamond · 5_shot_cot41.63 / 4In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingAIME 2025 · no_tools54.815 / 15In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingSimpleBench46.426 / 61In Quality Score
Anthropic Claude 3 Sonnet · 3.7SimpleBench44.930 / 61In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)SimpleBench41.432 / 61In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingHumanity's Last Exam · hle_text7.937 / 56In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (June 2024)SimpleBench27.545 / 61In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)Humanity's Last Exam · hle_text4.354 / 56In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingGPQA Diamond78.258 / 143In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingHumanity's Last Exam · hle8.073 / 90In Quality Score
Anthropic Claude 3 Sonnet · 3.7GPQA Diamond6886 / 143In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)Humanity's Last Exam · hle4.189 / 90In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)GPQA Diamond6593 / 143In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingArena Elo1387101 / 158In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)Arena Elo1372110 / 158In Quality Score
Anthropic Claude 3 Sonnet · 3.7Arena Elo1371111 / 158In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (June 2024)Arena Elo1342125 / 158In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinkingArena Elo1323134 / 158In Quality Score
Anthropic Claude 3 Sonnet · 3.0Arena Elo1280147 / 158In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingMMMU · mmmu_l385.94 / 5Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)AIME 2024 · consensus6426.76 / 7Tracked evidence
Anthropic Claude 3 Sonnet · 3.7 ThinkingMMMU · mmmu_single759 / 22Tracked evidence
Anthropic Claude 3.5 Haiku · Non-thinkingMMLU77.628 / 33Tracked evidence
Anthropic Claude 3 Sonnet · 3.7MATH 50082.243 / 55Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)MATH 50078.345 / 55Tracked evidence
Anthropic Claude 3.5 Haiku · Non-thinkingMMMU PRO45.047 / 52Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)AIME 20241658 / 69Tracked evidence

Coding

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3 Sonnet · 3.5 (October 2024)GSO (Global Software Optimization) · opt_at_1015.71 / 2In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingSWE-bench Verified · multiple70.38 / 10In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingAider (Polyglot)64.912 / 45In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)LiveCodeBench · 2024_08_2025_0538.915 / 17In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)GSO (Global Software Optimization) · opt_at_14.615 / 24In Quality Score
Anthropic Claude 3 Sonnet · 3.7Aider (Polyglot)60.417 / 45In Quality Score
Anthropic Claude 3 Sonnet · 3.7GSO (Global Software Optimization) · opt_at_13.818 / 24In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)Aider (Polyglot)51.626 / 45In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinkingAider (Polyglot)2837 / 45In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)LiveCodeBench36.438 / 69In Quality Score
Anthropic Claude 3 Sonnet · 3.7 ThinkingSWE-bench Verified62.350 / 68In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)Codeforces71742 / 47Tracked evidence

Agentic

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3 Sonnet · 3.7 Thinkingτ²-bench · retail81.213 / 34In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinkingτ²-bench · airline58.413 / 29In Quality Score

Multimodal

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3.5 Haiku · Non-thinkingChartQA87.24 / 9Tracked evidence

Document/OCR

Model / VariantBenchmarkScoreRankScoring
Anthropic Claude 3.5 Haiku · Non-thinkingDocVQA907 / 8Tracked evidence

Where this family sits in the market

Claude 3.5 Haiku takes the cost-efficiency frontier across all served Claude tiers. Sonnet 3.5 sits below Sonnet 4 on quality but materially below it on price.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Caveats

What this page does not tell you, listed honestly.

  • Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

If you are already on Claude 3.5

If you have a working deployment pinned to Sonnet 3.5, Sonnet 3.7, or Haiku 3.5, the question is when staying is defensible and which variant identifier you need to pin to.

Anthropic ships several checkpoints under the 3.5/3.7 banner, and mixing them up will produce contradictory numbers. The page covers all of them: 3.5-0620 and 3.5-1022 are the canonical 3.5 Sonnet checkpoints; 3.7 and 3.7-thinking are the bridge variants between the 3.5 and 4 generations. A bare "Sonnet 3.5 score" is ambiguous; pin to a specific identifier in production calls.

To size the migration: Sonnet 4.6-thinking at QS 96.7 against Sonnet 3.5-1022 at QS 64.5 is a substantial quality lift. Sonnet 4 list pricing held flat against the 3.5 line, and Anthropic's Claude 4.5+ Opus reset (input $15 to $5, output $75 to $25 per million) signals the direction of travel: the cost gap to current Claude is materially smaller than it was in the 3.5 era. For most teams, that math favours migration.

Reasons to stay on Claude 3.5 that are defensible:

  • Cheapest Anthropic tier still served. Haiku 3.5 and Haiku 4.5 both list at $1 input / $5 output per million in our index. If a deployment is pinned to 3.5-Haiku and meets your eval, the upgrade is not a forced one. Upgrading to 4.5 buys quality, not unit-cost savings.
  • You are Sonnet-on-Bedrock or Sonnet-on-Vertex with pinned routing. AWS Bedrock and Vertex AI have their own model SKU mappings; Sonnet 3.5 may still be the right pick if the Sonnet 4 SKU is not yet routable on your enterprise path or has a different cost profile than the direct Anthropic API.
  • Output-behaviour pinning. If your prompts and downstream parsers were tuned against Sonnet 3.5's output style, the migration to 4 involves at least one round of prompt-and-parser re-validation. Plan that work; do not assume the model swap is invisible.

Where the data is weak

  • Haiku 3.5 has thinner benchmark coverage than the Sonnet line in this generation. Treat its Quality Score and Arena ELO as the primary numbers; per-benchmark depth (GPQA, AIME) is light at last verification.
  • Pricing parity with Claude 4. Several Claude 4 tiers list at the same headline pricing as their 3.5 equivalents. The cost-driven reason to stay on 3.5 is therefore less about list price than about deployment friction (routing, fine-tunes, eval pinning).

When to look outside this era

  • Claude 4 family (/en/ai/llm/claude) is the natural successor. If the migration question is still open, that surface is the comparison to read.
  • Cheapest workhorse-tier API outside Anthropic: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1) and Gemini 3 Flash are the cross-family anchors to evaluate against. Both ship at lower per-token pricing than Sonnet 3.5-1022 with quality scores in the same tier or above.

Sources worth reading

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Claude 3.5 update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →