Anthropic family
Claude 3.5
Claude 3.5 Sonnet still ships at $3/$15 per 1M, the same price as Sonnet 4. When the cost-equal Claude 4 tier wins, when 3.5 still earns its slot.
Top in this family
Claude 3.7 Sonnet (Thinking) ranks #96 of 186 on overall quality (QS 74.2) at $3/$15 per 1M tokens.
- Variants
- 2
- License
- Closed weights
- Provider
- Anthropic
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| General API workhorse | Anthropic Claude 3 Sonnet 3.5 (October 2024) $3.00/1M / $15.00/1M | The previous-generation default. Choose when Sonnet 4's quality lift does not justify the price premium on your evals. |
| High-volume chat | Anthropic Claude 3.5 Haiku Non-thinking $1.00/1M / $5.00/1M | Cheapest served Claude tier. Use for high-volume chat where per-token cost compounds and Haiku 4.5's quality lift is not measurable on your workload. |
All variants
28 variants across 2 models (+ 4 cross-family for context). Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | Terminal | Tau | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3.7 ThinkingPrevious Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4 | 74.2 #96/186 | 78.2 | 8.0 | 62.3 | — | — | 81.2 | — | — | $3 | $15 | 200K | Feb 24, 2025 | |
3.7Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4 | 69.2 #120/186 | 68.0 | — | — | — | — | — | — | — | $3 | $15 | 200K | Feb 24, 2025 | |
3.5 (October 2024)Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4 | 64.5 #138/186 | 65.0 | 4.1 | — | — | — | — | — | — | $3 | $15 | 200K | Feb 24, 2025 | |
3.5 (June 2024)Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4 | 63.2 #143/186 | — | — | — | — | — | — | — | — | $3 | $15 | 200K | Feb 24, 2025 | |
Non-thinkingPrevious Claude 3.5 Haiku Newer: Claude Haiku 4.5 | 58.6 #166/186 | — | — | — | — | — | — | — | — | $1 | $5 | 200K | Nov 4, 2024 | |
3.0Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4 | — | — | — | — | — | — | — | — | — | $3 | $15 | 200K | Feb 24, 2025 | |
4.8 Thinkingcross-family Anthropic Claude Opus 4 | 108.6 #2/186 | 93.6 | 49.8 | 88.6 | 69.2 | — | — | 82.2 | — | $5 | $25 | 200K | May 22, 2025 | |
4.7 Thinkingcross-family Anthropic Claude Opus 4 | 107.8 #3/186 | 94.2 | 46.9 | 87.6 | 64.3 | 69.4 | — | 77.3 | — | $5 | $25 | 200K | May 22, 2025 | |
4.6 Thinkingcross-family Anthropic Claude Opus 4 | 104.1 #6/186 | 91.3 | 40.0 | 80.8 | 53.4 | 65.4 | 91.9 | 59.5 | 95.6 | $5 | $25 | 1.0M | May 22, 2025 | |
4.5 Thinkingcross-family Anthropic Claude Opus 4 | 98.6 #13/186 | 87.0 | 30.8 | 80.9 | — | 59.3 | 88.9 | 62.3 | 92.8 | $5 | $25 | 200K | May 22, 2025 | |
V4 Pro Thinkingcross-family DeepSeek V4 | 98.0 #15/186 | 90.1 | 37.7 | 80.6 | 55.4 | — | — | 73.6 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
4.6 Thinkingcross-family Anthropic Claude Sonnet 4 | 96.7 #16/186 | 89.9 | 33.2 | 79.6 | — | 59.1 | 91.7 | 61.3 | 86.9 | $3 | $15 | 200K | May 22, 2025 | |
4.6 Non-thinkingcross-family Anthropic Claude Opus 4 | 93.1 #23/186 | — | 19.0 | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
V4 Flash Thinkingcross-family DeepSeek V4 | 92.0 #27/186 | 88.1 | 34.8 | 79.0 | 52.6 | — | — | 69.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
4.5 Thinkingcross-family Anthropic Claude Sonnet 4 | 86.1 #41/186 | 83.4 | 17.7 | 77.2 | 43.6 | 42.8 | 86.2 | 43.8 | 87.0 | $3 | $15 | 1.0M | May 22, 2025 | |
4.1 Thinkingcross-family Anthropic Claude Opus 4 | 83.1 #50/186 | 81.0 | 11.7 | 74.5 | — | 38.0 | 86.8 | 40.9 | 78.0 | $15 | $75 | 200K | May 22, 2025 | |
V4 Procross-family DeepSeek V4 | 80.9 #61/186 | 72.9 | 7.7 | 73.6 | 52.1 | — | — | 69.4 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
4.5 Non-thinkingcross-family Anthropic Claude Opus 4 | 80.7 #63/186 | — | 14.2 | — | 45.9 | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
4.0 Thinkingcross-family Anthropic Claude Opus 4 | 80.7 #64/186 | 79.6 | 10.7 | 72.5 | — | — | 81.4 | — | 75.5 | $15 | $75 | 200K | May 22, 2025 | |
4.0 Non-thinkingcross-family Anthropic Claude Opus 4 | 79.1 #73/186 | 74.9 | 6.7 | 72.5 | — | — | 81.8 | — | 33.9 | $15 | $75 | 200K | May 22, 2025 | |
V4 Flashcross-family DeepSeek V4 | 78.1 #78/186 | 71.2 | 8.1 | 73.7 | 49.1 | — | — | 64.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
Thinkingcross-family Claude Haiku 4.5 | 77.9 #79/186 | 73.0 | 9.7 | 73.3 | — | — | 83.2 | 40.2 | 80.7 | $1 | $5 | 200K | Oct 15, 2025 | |
4.0 Thinkingcross-family Anthropic Claude Sonnet 4 | 75.6 #88/186 | 76.1 | 7.8 | 72.7 | 42.7 | — | 83.8 | — | 70.5 | $3 | $15 | 200K | May 22, 2025 | |
4.0 Non-thinkingcross-family Anthropic Claude Sonnet 4 | 73.5 #99/186 | 70.0 | 5.5 | 72.7 | — | — | 75.0 | — | 33.1 | $3 | $15 | 200K | May 22, 2025 | |
4.5 Non-thinkingcross-family Anthropic Claude Sonnet 4 | 73.0 #104/186 | — | 7.5 | — | — | 42.8 | — | — | — | $3 | $15 | 1.0M | May 22, 2025 | |
4.1 Non-thinkingcross-family Anthropic Claude Opus 4 | 70.4 #115/186 | — | 7.9 | — | — | — | — | — | — | $15 | $75 | 200K | May 22, 2025 | |
Non-thinkingcross-family Claude Haiku 4.5 | 66.3 #132/186 | — | — | — | 39.5 | 28.3 | — | — | — | $1 | $5 | 200K | Oct 15, 2025 | |
4.7 Non-thinkingcross-family Anthropic Claude Opus 4 | — | — | — | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (33 of 44 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | GSO (Global Software Optimization) · opt_at_10 | 15.7 | 1 / 2 | In Quality Score |
| Anthropic Claude 3.5 Haiku · Non-thinking | MMLU Pro · 5_shot_cot | 65 | 3 / 4 | In Quality Score |
| Anthropic Claude 3.5 Haiku · Non-thinking | GPQA Diamond · 5_shot_cot | 41.6 | 3 / 4 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | SWE-bench Verified · multiple | 70.3 | 8 / 10 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | Aider (Polyglot) | 64.9 | 12 / 45 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | τ²-bench · retail | 81.2 | 13 / 34 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | τ²-bench · airline | 58.4 | 13 / 29 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | AIME 2025 · no_tools | 54.8 | 15 / 15 | In Quality Score |
Show all benchmark evidence (44 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3.5 Haiku · Non-thinking | MMLU Pro · 5_shot_cot | 65 | 3 / 4 | In Quality Score |
| Anthropic Claude 3.5 Haiku · Non-thinking | GPQA Diamond · 5_shot_cot | 41.6 | 3 / 4 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | AIME 2025 · no_tools | 54.8 | 15 / 15 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | SimpleBench | 46.4 | 26 / 61 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 | SimpleBench | 44.9 | 30 / 61 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | SimpleBench | 41.4 | 32 / 61 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | Humanity's Last Exam · hle_text | 7.9 | 37 / 56 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (June 2024) | SimpleBench | 27.5 | 45 / 61 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | Humanity's Last Exam · hle_text | 4.3 | 54 / 56 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | GPQA Diamond | 78.2 | 58 / 143 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | Humanity's Last Exam · hle | 8.0 | 73 / 90 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 | GPQA Diamond | 68 | 86 / 143 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | Humanity's Last Exam · hle | 4.1 | 89 / 90 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | GPQA Diamond | 65 | 93 / 143 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | Arena Elo | 1387 | 101 / 158 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | Arena Elo | 1372 | 110 / 158 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 | Arena Elo | 1371 | 111 / 158 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (June 2024) | Arena Elo | 1342 | 125 / 158 | In Quality Score |
| Anthropic Claude 3.5 Haiku · Non-thinking | Arena Elo | 1323 | 134 / 158 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.0 | Arena Elo | 1280 | 147 / 158 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | MMMU · mmmu_l3 | 85.9 | 4 / 5 | Tracked evidence |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | AIME 2024 · consensus64 | 26.7 | 6 / 7 | Tracked evidence |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | MMMU · mmmu_single | 75 | 9 / 22 | Tracked evidence |
| Anthropic Claude 3.5 Haiku · Non-thinking | MMLU | 77.6 | 28 / 33 | Tracked evidence |
| Anthropic Claude 3 Sonnet · 3.7 | MATH 500 | 82.2 | 43 / 55 | Tracked evidence |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | MATH 500 | 78.3 | 45 / 55 | Tracked evidence |
| Anthropic Claude 3.5 Haiku · Non-thinking | MMMU PRO | 45.0 | 47 / 52 | Tracked evidence |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | AIME 2024 | 16 | 58 / 69 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | GSO (Global Software Optimization) · opt_at_10 | 15.7 | 1 / 2 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | SWE-bench Verified · multiple | 70.3 | 8 / 10 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | Aider (Polyglot) | 64.9 | 12 / 45 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | LiveCodeBench · 2024_08_2025_05 | 38.9 | 15 / 17 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | GSO (Global Software Optimization) · opt_at_1 | 4.6 | 15 / 24 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 | Aider (Polyglot) | 60.4 | 17 / 45 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 | GSO (Global Software Optimization) · opt_at_1 | 3.8 | 18 / 24 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | Aider (Polyglot) | 51.6 | 26 / 45 | In Quality Score |
| Anthropic Claude 3.5 Haiku · Non-thinking | Aider (Polyglot) | 28 | 37 / 45 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | LiveCodeBench | 36.4 | 38 / 69 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | SWE-bench Verified | 62.3 | 50 / 68 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.5 (October 2024) | Codeforces | 717 | 42 / 47 | Tracked evidence |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3 Sonnet · 3.7 Thinking | τ²-bench · retail | 81.2 | 13 / 34 | In Quality Score |
| Anthropic Claude 3 Sonnet · 3.7 Thinking | τ²-bench · airline | 58.4 | 13 / 29 | In Quality Score |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3.5 Haiku · Non-thinking | ChartQA | 87.2 | 4 / 9 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Anthropic Claude 3.5 Haiku · Non-thinking | DocVQA | 90 | 7 / 8 | Tracked evidence |
Where this family sits in the market
Claude 3.5 Haiku takes the cost-efficiency frontier across all served Claude tiers. Sonnet 3.5 sits below Sonnet 4 on quality but materially below it on price.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- GPT-4 Era: GPT-4o, GPT-4.1, o-series, gpt-oss Picks vs GPT-5
OpenAI's pre-GPT-5 lineup still served: GPT-4o, GPT-4.1, o-series reasoning, and gpt-oss. When a legacy tier still beats upgrading.
- Gemini 2 Era: 2.5 Pro, 2.5 Flash, 2.0 Pricing and Picks
Gemini 2.5 Flash ships at $0.30/$2.50 per 1M with 1M-token context. When 2.5 Pro and the 2.0 family beat upgrading to Gemini 3 on cost or workload.
Caveats
What this page does not tell you, listed honestly.
- Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.
Editor's notes
If you are already on Claude 3.5
If you have a working deployment pinned to Sonnet 3.5, Sonnet 3.7, or Haiku 3.5, the question is when staying is defensible and which variant identifier you need to pin to.
Anthropic ships several checkpoints under the 3.5/3.7 banner, and mixing them up will produce contradictory numbers. The page covers all of them: 3.5-0620 and 3.5-1022 are the canonical 3.5 Sonnet checkpoints; 3.7 and 3.7-thinking are the bridge variants between the 3.5 and 4 generations. A bare "Sonnet 3.5 score" is ambiguous; pin to a specific identifier in production calls.
To size the migration: Sonnet 4.6-thinking at QS 96.7 against Sonnet 3.5-1022 at QS 64.5 is a substantial quality lift. Sonnet 4 list pricing held flat against the 3.5 line, and Anthropic's Claude 4.5+ Opus reset (input $15 to $5, output $75 to $25 per million) signals the direction of travel: the cost gap to current Claude is materially smaller than it was in the 3.5 era. For most teams, that math favours migration.
Reasons to stay on Claude 3.5 that are defensible:
- Cheapest Anthropic tier still served. Haiku 3.5 and Haiku 4.5 both list at $1 input / $5 output per million in our index. If a deployment is pinned to 3.5-Haiku and meets your eval, the upgrade is not a forced one. Upgrading to 4.5 buys quality, not unit-cost savings.
- You are Sonnet-on-Bedrock or Sonnet-on-Vertex with pinned routing. AWS Bedrock and Vertex AI have their own model SKU mappings; Sonnet 3.5 may still be the right pick if the Sonnet 4 SKU is not yet routable on your enterprise path or has a different cost profile than the direct Anthropic API.
- Output-behaviour pinning. If your prompts and downstream parsers were tuned against Sonnet 3.5's output style, the migration to 4 involves at least one round of prompt-and-parser re-validation. Plan that work; do not assume the model swap is invisible.
Where the data is weak
- Haiku 3.5 has thinner benchmark coverage than the Sonnet line in this generation. Treat its Quality Score and Arena ELO as the primary numbers; per-benchmark depth (GPQA, AIME) is light at last verification.
- Pricing parity with Claude 4. Several Claude 4 tiers list at the same headline pricing as their 3.5 equivalents. The cost-driven reason to stay on 3.5 is therefore less about list price than about deployment friction (routing, fine-tunes, eval pinning).
When to look outside this era
- Claude 4 family (
/en/ai/llm/claude) is the natural successor. If the migration question is still open, that surface is the comparison to read. - Cheapest workhorse-tier API outside Anthropic: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1) and Gemini 3 Flash are the cross-family anchors to evaluate against. Both ship at lower per-token pricing than Sonnet 3.5-1022 with quality scores in the same tier or above.
Sources worth reading
- Anthropic API pricing: vendor price list (Claude 3.5 and Claude 4 tiers listed together)
- Claude model docs: deprecation timeline and variant identifiers
- Anthropic news: release notes for transitions between generations
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Claude 3.5 update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →