This is a previous-generation family. Most teams should look at Claude: Opus 4.8 (Thinking), Opus, Sonnet, Haiku Compared instead.

The variants on this page still work and are still listed, but pricing, capabilities, and benchmarks below describe the older generation. Use this page for migration planning, not as a starting point.

Anthropic family

Claude 3.5

Claude 3.5 Sonnet still ships at $3/$15 per 1M, the same price as Sonnet 4. When the cost-equal Claude 4 tier wins, when 3.5 still earns its slot.

Top in this family

Claude 3.7 Sonnet (Thinking) ranks #96 of 186 on overall quality (QS 74.2) at $3/$15 per 1M tokens.

Variants: 2
License: Closed weights
Provider: Anthropic

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.

Workload	Best pick	Why
General API workhorse	Anthropic Claude 3 Sonnet 3.5 (October 2024) $3.00/1M / $15.00/1M	The previous-generation default. Choose when Sonnet 4's quality lift does not justify the price premium on your evals.
High-volume chat	Anthropic Claude 3.5 Haiku Non-thinking $1.00/1M / $5.00/1M	Cheapest served Claude tier. Use for high-volume chat where per-token cost compounds and Haiku 4.5's quality lift is not measurable on your workload.

All variants

28 variants across 2 models (+ 4 cross-family for context). Sorted by quality score (descending).

Variant	QS	GPQA	HLE	SWE	SWE-Pro	Terminal	Tau	MCP	AIME	In $/M	Out $/M	Context	Released
3.7 ThinkingPrevious Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4	74.2 #96/186	78.2	8.0	62.3	—	—	81.2	—	—	$3	$15	200K	Feb 24, 2025
3.7Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4	69.2 #120/186	68.0	—	—	—	—	—	—	—	$3	$15	200K	Feb 24, 2025
3.5 (October 2024)Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4	64.5 #138/186	65.0	4.1	—	—	—	—	—	—	$3	$15	200K	Feb 24, 2025
3.5 (June 2024)Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4	63.2 #143/186	—	—	—	—	—	—	—	—	$3	$15	200K	Feb 24, 2025
Non-thinkingPrevious Claude 3.5 Haiku Newer: Claude Haiku 4.5	58.6 #166/186	—	—	—	—	—	—	—	—	$1	$5	200K	Nov 4, 2024
3.0Previous Claude 3 Sonnet Newer: Anthropic Claude Sonnet 4	—	—	—	—	—	—	—	—	—	$3	$15	200K	Feb 24, 2025
4.8 Thinkingcross-family Anthropic Claude Opus 4	108.6 #2/186	93.6	49.8	88.6	69.2	—	—	82.2	—	$5	$25	200K	May 22, 2025
4.7 Thinkingcross-family Anthropic Claude Opus 4	107.8 #3/186	94.2	46.9	87.6	64.3	69.4	—	77.3	—	$5	$25	200K	May 22, 2025
4.6 Thinkingcross-family Anthropic Claude Opus 4	104.1 #6/186	91.3	40.0	80.8	53.4	65.4	91.9	59.5	95.6	$5	$25	1.0M	May 22, 2025
4.5 Thinkingcross-family Anthropic Claude Opus 4	98.6 #13/186	87.0	30.8	80.9	—	59.3	88.9	62.3	92.8	$5	$25	200K	May 22, 2025
V4 Pro Thinkingcross-family DeepSeek V4	98.0 #15/186	90.1	37.7	80.6	55.4	—	—	73.6	—	$0.435	$0.87	1.0M	Apr 24, 2026
4.6 Thinkingcross-family Anthropic Claude Sonnet 4	96.7 #16/186	89.9	33.2	79.6	—	59.1	91.7	61.3	86.9	$3	$15	200K	May 22, 2025
4.6 Non-thinkingcross-family Anthropic Claude Opus 4	93.1 #23/186	—	19.0	—	—	—	—	—	—	$5	$25	200K	May 22, 2025
V4 Flash Thinkingcross-family DeepSeek V4	92.0 #27/186	88.1	34.8	79.0	52.6	—	—	69.0	—	$0.098	$0.197	1.0M	Apr 24, 2026
4.5 Thinkingcross-family Anthropic Claude Sonnet 4	86.1 #41/186	83.4	17.7	77.2	43.6	42.8	86.2	43.8	87.0	$3	$15	1.0M	May 22, 2025
4.1 Thinkingcross-family Anthropic Claude Opus 4	83.1 #50/186	81.0	11.7	74.5	—	38.0	86.8	40.9	78.0	$15	$75	200K	May 22, 2025
V4 Procross-family DeepSeek V4	80.9 #61/186	72.9	7.7	73.6	52.1	—	—	69.4	—	$0.435	$0.87	1.0M	Apr 24, 2026
4.5 Non-thinkingcross-family Anthropic Claude Opus 4	80.7 #63/186	—	14.2	—	45.9	—	—	—	—	$5	$25	200K	May 22, 2025
4.0 Thinkingcross-family Anthropic Claude Opus 4	80.7 #64/186	79.6	10.7	72.5	—	—	81.4	—	75.5	$15	$75	200K	May 22, 2025
4.0 Non-thinkingcross-family Anthropic Claude Opus 4	79.1 #73/186	74.9	6.7	72.5	—	—	81.8	—	33.9	$15	$75	200K	May 22, 2025
V4 Flashcross-family DeepSeek V4	78.1 #78/186	71.2	8.1	73.7	49.1	—	—	64.0	—	$0.098	$0.197	1.0M	Apr 24, 2026
Thinkingcross-family Claude Haiku 4.5	77.9 #79/186	73.0	9.7	73.3	—	—	83.2	40.2	80.7	$1	$5	200K	Oct 15, 2025
4.0 Thinkingcross-family Anthropic Claude Sonnet 4	75.6 #88/186	76.1	7.8	72.7	42.7	—	83.8	—	70.5	$3	$15	200K	May 22, 2025
4.0 Non-thinkingcross-family Anthropic Claude Sonnet 4	73.5 #99/186	70.0	5.5	72.7	—	—	75.0	—	33.1	$3	$15	200K	May 22, 2025
4.5 Non-thinkingcross-family Anthropic Claude Sonnet 4	73.0 #104/186	—	7.5	—	—	42.8	—	—	—	$3	$15	1.0M	May 22, 2025
4.1 Non-thinkingcross-family Anthropic Claude Opus 4	70.4 #115/186	—	7.9	—	—	—	—	—	—	$15	$75	200K	May 22, 2025
Non-thinkingcross-family Claude Haiku 4.5	66.3 #132/186	—	—	—	39.5	28.3	—	—	—	$1	$5	200K	Oct 15, 2025
4.7 Non-thinkingcross-family Anthropic Claude Opus 4	—	—	—	—	—	—	—	—	—	$5	$25	200K	May 22, 2025

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (33 of 44 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	GSO (Global Software Optimization) · opt_at_10	15.7	1 / 2	In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinking	MMLU Pro · 5_shot_cot	65	3 / 4	In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinking	GPQA Diamond · 5_shot_cot	41.6	3 / 4	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	SWE-bench Verified · multiple	70.3	8 / 10	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	Aider (Polyglot)	64.9	12 / 45	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	τ²-bench · retail	81.2	13 / 34	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	τ²-bench · airline	58.4	13 / 29	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	AIME 2025 · no_tools	54.8	15 / 15	In Quality Score

Show all benchmark evidence (44 rows)

Reasoning

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3.5 Haiku · Non-thinking	MMLU Pro · 5_shot_cot	65	3 / 4	In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinking	GPQA Diamond · 5_shot_cot	41.6	3 / 4	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	AIME 2025 · no_tools	54.8	15 / 15	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	SimpleBench	46.4	26 / 61	In Quality Score
Anthropic Claude 3 Sonnet · 3.7	SimpleBench	44.9	30 / 61	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	SimpleBench	41.4	32 / 61	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	Humanity's Last Exam · hle_text	7.9	37 / 56	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (June 2024)	SimpleBench	27.5	45 / 61	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	Humanity's Last Exam · hle_text	4.3	54 / 56	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	GPQA Diamond	78.2	58 / 143	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	Humanity's Last Exam · hle	8.0	73 / 90	In Quality Score
Anthropic Claude 3 Sonnet · 3.7	GPQA Diamond	68	86 / 143	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	Humanity's Last Exam · hle	4.1	89 / 90	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	GPQA Diamond	65	93 / 143	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	Arena Elo	1387	101 / 158	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	Arena Elo	1372	110 / 158	In Quality Score
Anthropic Claude 3 Sonnet · 3.7	Arena Elo	1371	111 / 158	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (June 2024)	Arena Elo	1342	125 / 158	In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinking	Arena Elo	1323	134 / 158	In Quality Score
Anthropic Claude 3 Sonnet · 3.0	Arena Elo	1280	147 / 158	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	MMMU · mmmu_l3	85.9	4 / 5	Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	AIME 2024 · consensus64	26.7	6 / 7	Tracked evidence
Anthropic Claude 3 Sonnet · 3.7 Thinking	MMMU · mmmu_single	75	9 / 22	Tracked evidence
Anthropic Claude 3.5 Haiku · Non-thinking	MMLU	77.6	28 / 33	Tracked evidence
Anthropic Claude 3 Sonnet · 3.7	MATH 500	82.2	43 / 55	Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	MATH 500	78.3	45 / 55	Tracked evidence
Anthropic Claude 3.5 Haiku · Non-thinking	MMMU PRO	45.0	47 / 52	Tracked evidence
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	AIME 2024	16	58 / 69	Tracked evidence

Coding

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	GSO (Global Software Optimization) · opt_at_10	15.7	1 / 2	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	SWE-bench Verified · multiple	70.3	8 / 10	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	Aider (Polyglot)	64.9	12 / 45	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	LiveCodeBench · 2024_08_2025_05	38.9	15 / 17	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	GSO (Global Software Optimization) · opt_at_1	4.6	15 / 24	In Quality Score
Anthropic Claude 3 Sonnet · 3.7	Aider (Polyglot)	60.4	17 / 45	In Quality Score
Anthropic Claude 3 Sonnet · 3.7	GSO (Global Software Optimization) · opt_at_1	3.8	18 / 24	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	Aider (Polyglot)	51.6	26 / 45	In Quality Score
Anthropic Claude 3.5 Haiku · Non-thinking	Aider (Polyglot)	28	37 / 45	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	LiveCodeBench	36.4	38 / 69	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	SWE-bench Verified	62.3	50 / 68	In Quality Score
Anthropic Claude 3 Sonnet · 3.5 (October 2024)	Codeforces	717	42 / 47	Tracked evidence

Agentic

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3 Sonnet · 3.7 Thinking	τ²-bench · retail	81.2	13 / 34	In Quality Score
Anthropic Claude 3 Sonnet · 3.7 Thinking	τ²-bench · airline	58.4	13 / 29	In Quality Score

Multimodal

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3.5 Haiku · Non-thinking	ChartQA	87.2	4 / 9	Tracked evidence

Document/OCR

Model / Variant	Benchmark	Score	Rank	Scoring
Anthropic Claude 3.5 Haiku · Non-thinking	DocVQA	90	7 / 8	Tracked evidence

Where this family sits in the market

Claude 3.5 Haiku takes the cost-efficiency frontier across all served Claude tiers. Sonnet 3.5 sits below Sonnet 4 on quality but materially below it on price.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

GPT-4 Era: GPT-4o, GPT-4.1, o-series, gpt-oss Picks vs GPT-5
OpenAI's pre-GPT-5 lineup still served: GPT-4o, GPT-4.1, o-series reasoning, and gpt-oss. When a legacy tier still beats upgrading.
Gemini 2 Era: 2.5 Pro, 2.5 Flash, 2.0 Pricing and Picks
Gemini 2.5 Flash ships at $0.30/$2.50 per 1M with 1M-token context. When 2.5 Pro and the 2.0 family beat upgrading to Gemini 3 on cost or workload.

Caveats

What this page does not tell you, listed honestly.

Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified 2026-05-09AI-assisted, human-reviewed

If you are already on Claude 3.5

If you have a working deployment pinned to Sonnet 3.5, Sonnet 3.7, or Haiku 3.5, the question is when staying is defensible and which variant identifier you need to pin to.

Anthropic ships several checkpoints under the 3.5/3.7 banner, and mixing them up will produce contradictory numbers. The page covers all of them: 3.5-0620 and 3.5-1022 are the canonical 3.5 Sonnet checkpoints; 3.7 and 3.7-thinking are the bridge variants between the 3.5 and 4 generations. A bare "Sonnet 3.5 score" is ambiguous; pin to a specific identifier in production calls.

To size the migration: Sonnet 4.6-thinking at QS 96.7 against Sonnet 3.5-1022 at QS 64.5 is a substantial quality lift. Sonnet 4 list pricing held flat against the 3.5 line, and Anthropic's Claude 4.5+ Opus reset (input $15 to $5, output $75 to $25 per million) signals the direction of travel: the cost gap to current Claude is materially smaller than it was in the 3.5 era. For most teams, that math favours migration.

Reasons to stay on Claude 3.5 that are defensible:

Cheapest Anthropic tier still served. Haiku 3.5 and Haiku 4.5 both list at $1 input / $5 output per million in our index. If a deployment is pinned to 3.5-Haiku and meets your eval, the upgrade is not a forced one. Upgrading to 4.5 buys quality, not unit-cost savings.
You are Sonnet-on-Bedrock or Sonnet-on-Vertex with pinned routing. AWS Bedrock and Vertex AI have their own model SKU mappings; Sonnet 3.5 may still be the right pick if the Sonnet 4 SKU is not yet routable on your enterprise path or has a different cost profile than the direct Anthropic API.
Output-behaviour pinning. If your prompts and downstream parsers were tuned against Sonnet 3.5's output style, the migration to 4 involves at least one round of prompt-and-parser re-validation. Plan that work; do not assume the model swap is invisible.

Where the data is weak

Haiku 3.5 has thinner benchmark coverage than the Sonnet line in this generation. Treat its Quality Score and Arena ELO as the primary numbers; per-benchmark depth (GPQA, AIME) is light at last verification.
Pricing parity with Claude 4. Several Claude 4 tiers list at the same headline pricing as their 3.5 equivalents. The cost-driven reason to stay on 3.5 is therefore less about list price than about deployment friction (routing, fine-tunes, eval pinning).

When to look outside this era

Claude 4 family (/en/ai/llm/claude) is the natural successor. If the migration question is still open, that surface is the comparison to read.
Cheapest workhorse-tier API outside Anthropic: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1) and Gemini 3 Flash are the cross-family anchors to evaluate against. Both ship at lower per-token pricing than Sonnet 3.5-1022 with quality scores in the same tier or above.

Sources worth reading

Anthropic API pricing: vendor price list (Claude 3.5 and Claude 4 tiers listed together)
Claude model docs: deprecation timeline and variant identifiers
Anthropic news: release notes for transitions between generations

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Claude 3.5 update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →