OpenAI family

GPT-5

GPT-5: GPT-5.5 Thinking ranks #4 of 186 with 400K-token context and $1.25/$10 per 1M tokens. Compare GPT-5, Mini, Nano, and Codex by workload.

Top in this family

GPT-5.5 Thinking ranks #4 of 186 on overall quality (QS 106.6) at $1.25/$10 per 1M tokens.

Practical pick

Thinking (5.4) at $0.75/$4.5 per 1M tokens (rank #37 of 186).

Variants
4
License
Closed weights
Provider
OpenAI

★ Most teams should start here

GPT-5 Mini

Variant: Thinking (5.4)

The practical default. Same family quality ceiling as the flagship for everyday API workloads, at a fraction of the price. Step up to full GPT-5 only when you have a workload that visibly benefits from it.

Quality Score
87.1
Input
$0.750/1M
Output
$4.50/1M
Context
License
Closed · API

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
Coding agents
GPT-5 Codex
Non-thinking
$1.25/1M / $10.00/1M
Purpose-built coding variant. Pick this when agentic coding throughput and tool-use reliability are the binding constraint.
General API workhorse
GPT-5 Mini
Thinking (5.4)
$0.750/1M / $4.50/1M
Best quality-per-dollar in the family for chat, summarization, and tool-augmented assistants. Reach for full GPT-5 only when mini measurably underperforms on your evals.
Long-context RAG
GPT-5
GPT-5.5 Thinking
$1.25/1M / $10.00/1M
Full GPT-5 has the strongest long-context recall in the family. Use when document scale and faithful retrieval over long inputs dominate.
High-volume chat
GPT-5 Nano
Thinking (5.4)
$0.200/1M / $1.25/1M
Cheapest tier with usable quality for production chat at scale. Trade some capability for the per-token cost difference.

All variants

22 variants across 4 models. Sorted by quality score (descending) · Closed API.

VariantQSGPQAHLESWESWE-ProTerminalTauMCPAIMEIn $/MOut $/MContextReleased
GPT-5.5 Thinking
GPT-5
106.6
#4/186
93.641.458.675.3$1.25$10400KAug 7, 2025
GPT-5.4 Thinking
GPT-5
102.8
#8/186
92.843.957.775.167.2$2.5$151.1MAug 7, 2025
GPT-5.3 Codex
GPT-5
98.3
#14/186
92.656.864.7$1.75$14400KAug 7, 2025
GPT-5.2 Thinking
GPT-5
95.7
#18/186
92.435.480.055.654.082.060.6100.0$1.75$14400KAug 7, 2025
GPT-5.1 Codex Max
GPT-5
93.2
#22/186
77.9$1.25$10400KAug 7, 2025
Thinking (5.4)
GPT-5 Mini
87.1
#37/186
88.028.254.457.7$0.75$4.5Aug 7, 2025
GPT-5.0
GPT-5
86.7
#38/186
85.724.872.841.835.281.194.6$1.25$10400KAug 7, 2025
GPT-5.1 Thinking
GPT-5
85.6
#43/186
88.126.576.350.1$1.25$10400KAug 7, 2025
Non-thinking
GPT-5 Codex
81.5
#56/186
74.543.4$1.25$10400KSep 15, 2025
GPT-5.2 Codex
GPT-5
81.5
#57/186
41.0$1.75$14400KAug 7, 2025
GPT-5.2
GPT-5
79.7
#69/186
27.829.954.0$1.75$14400KAug 7, 2025
Thinking (5.0)
GPT-5 Mini
79.3
#72/186
82.316.772.045.724.047.691.1$0.25$2400KAug 7, 2025
Thinking (5.4)
GPT-5 Nano
78.7
#76/186
82.824.352.456.1$0.2$1.25Aug 7, 2025
GPT-5.1 Codex
GPT-5
77.8
#80/186
36.9$1.25$10400KAug 7, 2025
GPT-5.1
GPT-5
76.0
#87/186
6.847.6$1.25$10400KAug 7, 2025
GPT-5.1 Codex Mini
GPT-5
71.7
#110/186
$0.25$2400KAug 7, 2025
Thinking (5.0)
GPT-5 Nano
59.8
#160/186
7.9$0.05$0.4400KAug 7, 2025
GPT-5.3
GPT-5
$1.25$10400KAug 7, 2025
GPT-5.3 Instant
GPT-5
$1.75$14128KAug 7, 2025
GPT-5.4
GPT-5
$2.5$151.1MAug 7, 2025
GPT-5.5
GPT-5
$5$30Aug 7, 2025
Non-Thinking (5.0)
GPT-5 Nano
$0.05$0.4400KAug 7, 2025

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (118 of 360 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.2 ThinkingAIME 20251001 / 88In Quality Score
GPT-5 · GPT-5.2 ThinkingAIME 2025 · no_tools1001 / 15In Quality Score
GPT-5 · GPT-5.0Aider (Polyglot)881 / 45In Quality Score
GPT-5 · GPT-5.4 ThinkingLiveCodeBench · pro87.51 / 5In Quality Score
GPT-5 · GPT-5.0LiveCodeBench · 2025_01_2025_05_single86.81 / 11In Quality Score
GPT-5 · GPT-5.0AIME 2025 · aime_2025_python99.62 / 7In Quality Score
GPT-5 · GPT-5.5 ThinkingLiveBench80.72 / 110In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · verified43.32 / 5In Quality Score
Show all benchmark evidence (360 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.2 ThinkingAIME 20251001 / 88In Quality Score
GPT-5 · GPT-5.2 ThinkingAIME 2025 · no_tools1001 / 15In Quality Score
GPT-5 · GPT-5.0AIME 2025 · aime_2025_python99.62 / 7In Quality Score
GPT-5 · GPT-5.5 ThinkingLiveBench80.72 / 110In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · verified43.32 / 5In Quality Score
GPT-5 · GPT-5.4 ThinkingLiveBench80.33 / 110In Quality Score
GPT-5 · GPT-5.0AIME 202594.64 / 88In Quality Score
GPT-5 · GPT-5.1 ThinkingAIME 2025 · no_tools944 / 15In Quality Score
GPT-5 · GPT-5.4 ThinkingHumanity's Last Exam · hle_text36.54 / 56In Quality Score
GPT-5 · GPT-5.5 ThinkingGPQA Diamond93.65 / 143In Quality Score
GPT-5 · GPT-5.5 ThinkingSimpleBench695 / 61In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · search_code45.55 / 6In Quality Score
GPT-5 · GPT-5.4 ThinkingHumanity's Last Exam · hle43.95 / 90In Quality Score
GPT-5 · GPT-5.4 ThinkingGPQA Diamond92.86 / 143In Quality Score
GPT-5 · GPT-5.2 ThinkingMMLU Pro87.46 / 86In Quality Score
GPT-5 · GPT-5.3 CodexGPQA Diamond92.67 / 143In Quality Score
GPT-5 · GPT-5.5 ThinkingHumanity's Last Exam · tools52.27 / 38In Quality Score
GPT-5 · GPT-5.5 ThinkingHumanity's Last Exam · hle41.47 / 90In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · hle_text34.57 / 56In Quality Score
GPT-5 · GPT-5.5 ThinkingArena Elo14828 / 158In Quality Score
GPT-5 · GPT-5.2 ThinkingGPQA Diamond92.48 / 143In Quality Score
GPT-5 · GPT-5.4 ThinkingHumanity's Last Exam · tools52.18 / 38In Quality Score
GPT-5 · GPT-5.4 ThinkingArena Elo14809 / 158In Quality Score
GPT-5 · GPT-5.0MMLU Pro87.19 / 86In Quality Score
GPT-5 · GPT-5.2Humanity's Last Exam · hle_text28.510 / 56In Quality Score
GPT-5 · GPT-5.5Arena Elo147611 / 158In Quality Score
GPT-5 · GPT-5.0Humanity's Last Exam · hle_text26.311 / 56In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · hle35.412 / 90In Quality Score
GPT-5 Mini · Thinking (5.0)AIME 202591.113 / 88In Quality Score
GPT-5 · GPT-5.1 ThinkingHumanity's Last Exam · hle_text24.613 / 56In Quality Score
GPT-5 · GPT-5.2 ThinkingLiveBench74.815 / 110In Quality Score
GPT-5 · GPT-5.0SimpleBench56.716 / 61In Quality Score
GPT-5 · GPT-5.2 CodexLiveBench74.318 / 110In Quality Score
GPT-5 Mini · Thinking (5.0)Humanity's Last Exam · hle_text19.718 / 56In Quality Score
GPT-5 · GPT-5.1 ThinkingGPQA Diamond88.119 / 143In Quality Score
GPT-5 · GPT-5.1SimpleBench53.219 / 61In Quality Score
GPT-5 Mini · Thinking (5.4)GPQA Diamond8820 / 143In Quality Score
GPT-5 · GPT-5.4Arena Elo146921 / 158In Quality Score
GPT-5 · GPT-5.1 Codex MaxLiveBench74.021 / 110In Quality Score
GPT-5 · GPT-5.2 ThinkingHumanity's Last Exam · tools45.522 / 38In Quality Score
GPT-5 · GPT-5.3 CodexLiveBench72.824 / 110In Quality Score
GPT-5 Mini · Thinking (5.4)Humanity's Last Exam · hle28.224 / 90In Quality Score
GPT-5 · GPT-5.2Humanity's Last Exam · hle27.826 / 90In Quality Score
GPT-5 · GPT-5.0Humanity's Last Exam · tools41.727 / 38In Quality Score
GPT-5 · GPT-5.1 ThinkingHumanity's Last Exam · hle26.527 / 90In Quality Score
GPT-5 · GPT-5.1LiveBench72.028 / 110In Quality Score
GPT-5 · GPT-5.2 ThinkingSimpleBench45.828 / 61In Quality Score
GPT-5 Mini · Thinking (5.4)Humanity's Last Exam · tools41.528 / 38In Quality Score
GPT-5 Nano · Thinking (5.4)Humanity's Last Exam · tools37.731 / 38In Quality Score
GPT-5 · GPT-5.1Arena Elo145532 / 158In Quality Score
GPT-5 · GPT-5.0GPQA Diamond85.732 / 143In Quality Score
GPT-5 · GPT-5.0Humanity's Last Exam · hle24.832 / 90In Quality Score
GPT-5 Mini · Thinking (5.0)MMLU Pro83.733 / 86In Quality Score
GPT-5 · GPT-5.0LiveBench70.533 / 110In Quality Score
GPT-5 Mini · Thinking (5.0)Humanity's Last Exam · tools31.633 / 38In Quality Score
GPT-5 Nano · Thinking (5.4)Humanity's Last Exam · hle24.334 / 90In Quality Score
GPT-5 Mini · Thinking (5.4)Arena Elo145135 / 158In Quality Score
GPT-5 Nano · Thinking (5.0)LiveBench70.135 / 110In Quality Score
GPT-5 · GPT-5.3Arena Elo144938 / 158In Quality Score
GPT-5 · GPT-5.1 CodexLiveBench68.640 / 110In Quality Score
GPT-5 Nano · Thinking (5.4)GPQA Diamond82.845 / 143In Quality Score
GPT-5 Mini · Thinking (5.0)LiveBench67.545 / 110In Quality Score
GPT-5 · GPT-5.1Humanity's Last Exam · hle_text6.545 / 56In Quality Score
GPT-5 · GPT-5.2Arena Elo143848 / 158In Quality Score
GPT-5 Mini · Thinking (5.0)GPQA Diamond82.348 / 143In Quality Score
GPT-5 · GPT-5.0Arena Elo143450 / 158In Quality Score
GPT-5 Mini · Thinking (5.0)Humanity's Last Exam · hle16.750 / 90In Quality Score
GPT-5 · GPT-5.1 Codex MiniLiveBench60.461 / 110In Quality Score
GPT-5 · GPT-5.3 InstantLiveBench60.064 / 110In Quality Score
GPT-5 · GPT-5.1Humanity's Last Exam · hle6.881 / 90In Quality Score
GPT-5 Nano · Thinking (5.4)Arena Elo140385 / 158In Quality Score
GPT-5 · GPT-5.2LiveBench48.987 / 110In Quality Score
GPT-5 Mini · Thinking (5.0)Arena Elo139096 / 158In Quality Score
GPT-5 Nano · Thinking (5.0)Arena Elo1337127 / 158In Quality Score
GPT-5 · GPT-5.2 ThinkingHMMT Feb 202599.41 / 44Tracked evidence
GPT-5 · GPT-5.4 ThinkingAIME 202698.71 / 19Tracked evidence
GPT-5 · GPT-5.2 ThinkingHMMT Nov 202597.11 / 31Tracked evidence
GPT-5 · GPT-5.0HMMT Feb 2025 · python96.71 / 6Tracked evidence
GPT-5 · GPT-5.5 ThinkingMRCR · v2_128k94.81 / 23Tracked evidence
GPT-5 · GPT-5.4 ThinkingIPhO 2025 (Theory)93.51 / 3Tracked evidence
GPT-5 · GPT-5.4 ThinkingIMO AnswerBench91.41 / 28Tracked evidence
GPT-5 · GPT-5.2 ThinkingMAXIFE88.41 / 21Tracked evidence
GPT-5 · GPT-5.5 ThinkingMRCR · v2_128k_256k87.51 / 4Tracked evidence
GPT-5 · GPT-5.0MMMU · mmmu_single84.21 / 22Tracked evidence
GPT-5 · GPT-5.5 ThinkingMMMU PRO · tools83.21 / 10Tracked evidence
GPT-5 · GPT-5.2 ThinkingBrowseComp_zh76.11 / 20Tracked evidence
GPT-5 · GPT-5.5 ThinkingMRCR · v2_512k_1m741 / 3Tracked evidence
GPT-5 · GPT-5.0HealthBench67.21 / 5Tracked evidence
GPT-5 · GPT-5.5 ThinkingFrontierMath · tier1_351.71 / 5Tracked evidence
GPT-5 · GPT-5.5 ThinkingGraphwalks · bfs_1m45.41 / 3Tracked evidence
GPT-5 · GPT-5.5 ThinkingFrontierMath · tier435.41 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingMMMU PRO · tools81.52 / 10Tracked evidence
GPT-5 · GPT-5.4 ThinkingMRCR · v2_128k_256k79.32 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingGraphwalks · parents_1m582 / 3Tracked evidence
GPT-5 · GPT-5.4 ThinkingFrontierMath · tier1_347.62 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingHealthBench · hard40.12 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingMRCR · v2_512k_1m36.62 / 3Tracked evidence
GPT-5 · GPT-5.4 ThinkingFrontier Science Research332 / 4Tracked evidence
GPT-5 · GPT-5.4 ThinkingFrontierMath · tier427.12 / 5Tracked evidence
GPT-5 · GPT-5.1 ThinkingVendingBench21473.43 / 4Tracked evidence
GPT-5 · GPT-5.2 ThinkingAIME 202696.73 / 19Tracked evidence
GPT-5 · GPT-5.1 ThinkingGlobalPIQA90.93 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingGraphwalks · parents_256k90.13 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingBrowseComp84.43 / 51Tracked evidence
GPT-5 · GPT-5.4 ThinkingBrowseComp · context_manage82.73 / 15Tracked evidence
GPT-5 · GPT-5.4 ThinkingMMMU PRO81.23 / 52Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMMU PRO · tools80.43 / 10Tracked evidence
GPT-5 · GPT-5.2 ThinkingMRCR · v2_128k_256k773 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingGraphwalks · bfs_256k73.73 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingFinanceAgent · v251.83 / 7Tracked evidence
GPT-5 · GPT-5.4 ThinkingGraphwalks · parents_1m443 / 3Tracked evidence
GPT-5 · GPT-5.2 ThinkingFrontier Science Research25.23 / 4Tracked evidence
GPT-5 · GPT-5.4 ThinkingGraphwalks · bfs_1m9.43 / 3Tracked evidence
GPT-5 · GPT-5.4 ThinkingHMMT Nov 202595.84 / 31Tracked evidence
GPT-5 · GPT-5.4 ThinkingHMMT Feb 202691.84 / 16Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMMLU89.64 / 38Tracked evidence
GPT-5 · GPT-5.2 ThinkingIMO AnswerBench86.34 / 28Tracked evidence
GPT-5 · GPT-5.4 ThinkingGraphwalks · parents_256k82.84 / 4Tracked evidence
GPT-5 · GPT-5.5 ThinkingMMMU PRO81.24 / 52Tracked evidence
GPT-5 · GPT-5.2 ThinkingWMT24++78.84 / 6Tracked evidence
GPT-5 Mini · Thinking (5.4)MMMU PRO · tools784 / 10Tracked evidence
GPT-5 Mini · Thinking (5.0)IFBench75.44 / 28Tracked evidence
GPT-5 · GPT-5.0Longform Writing71.44 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingGraphwalks · bfs_256k62.54 / 4Tracked evidence
GPT-5 · GPT-5.2 ThinkingFACTS Benchmark Suite61.44 / 12Tracked evidence
GPT-5 · GPT-5.5 ThinkingFinanceAgent604 / 15Tracked evidence
GPT-5 · GPT-5.2 ThinkingFrontierMath · tier1_340.74 / 5Tracked evidence
GPT-5 · GPT-5.2 ThinkingFrontierMath · tier418.84 / 5Tracked evidence
GPT-5 · GPT-5.2 ThinkingGlobal PIQA91.25 / 26Tracked evidence
GPT-5 · GPT-5.2 ThinkingMRCR · v2_128k83.85 / 23Tracked evidence
GPT-5 · GPT-5.2 ThinkingIFBench755 / 28Tracked evidence
GPT-5 · GPT-5.2 ThinkingSciCode525 / 24Tracked evidence
GPT-5 · GPT-5.2 ThinkingFinanceAgent59.56 / 15Tracked evidence
GPT-5 · GPT-5.1 ThinkingMathArenaApex16 / 8Tracked evidence
GPT-5 · GPT-5.0HMMT Feb 202593.37 / 44Tracked evidence
GPT-5 · GPT-5.1 ThinkingMMLU917 / 33Tracked evidence
GPT-5 Mini · Thinking (5.0)MAXIFE85.37 / 21Tracked evidence
GPT-5 Mini · Thinking (5.0)MMMU PRO · tools74.17 / 10Tracked evidence
GPT-5 · GPT-5.4 ThinkingFinanceAgent567 / 15Tracked evidence
GPT-5 · GPT-5.4 ThinkingBrowseComp82.78 / 51Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMMU PRO79.58 / 52Tracked evidence
GPT-5 Nano · Thinking (5.4)MMMU PRO · tools69.59 / 10Tracked evidence
GPT-5 · GPT-5.0BrowseComp_zh639 / 20Tracked evidence
GPT-5 Mini · Thinking (5.0)Global PIQA88.510 / 26Tracked evidence
GPT-5 · GPT-5.3 CodexBrowseComp77.310 / 51Tracked evidence
GPT-5 · GPT-5.1 ThinkingMRCR · v2_128k61.610 / 23Tracked evidence
GPT-5 · GPT-5.3 CodexFinanceAgent5410 / 15Tracked evidence
GPT-5 · GPT-5.2 ThinkingSimpleQA3810 / 40Tracked evidence
GPT-5 Mini · Thinking (5.0)FACTS Benchmark Suite33.710 / 12Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMLU89.611 / 33Tracked evidence
GPT-5 · GPT-5.2 ThinkingBrowseComp · context_manage65.812 / 15Tracked evidence
GPT-5 · GPT-5.0FinanceAgent46.912 / 15Tracked evidence
GPT-5 · GPT-5.1 ThinkingSimpleQA34.912 / 40Tracked evidence
GPT-5 · GPT-5.0SciCode42.913 / 24Tracked evidence
GPT-5 Mini · Thinking (5.4)MMMU PRO76.615 / 52Tracked evidence
GPT-5 · GPT-5.0MMLU89.416 / 33Tracked evidence
GPT-5 · GPT-5.1 ThinkingMMMU PRO7616 / 52Tracked evidence
GPT-5 · GPT-5.2 ThinkingBrowseComp65.816 / 51Tracked evidence
GPT-5 Mini · Thinking (5.0)BrowseComp_zh49.516 / 20Tracked evidence
GPT-5 Mini · Thinking (5.0)MRCR · v2_128k52.517 / 23Tracked evidence
GPT-5 Mini · Thinking (5.0)HMMT Feb 202587.818 / 44Tracked evidence
GPT-5 Mini · Thinking (5.0)MMMLU84.918 / 38Tracked evidence
GPT-5 · GPT-5.0IMO AnswerBench7621 / 28Tracked evidence
GPT-5 Mini · Thinking (5.0)HMMT Nov 202584.222 / 31Tracked evidence
GPT-5 · GPT-5.0BrowseComp54.923 / 51Tracked evidence
GPT-5 Mini · Thinking (5.0)MMMU PRO74.124 / 52Tracked evidence
GPT-5 Mini · Thinking (5.0)BrowseComp48.128 / 51Tracked evidence
GPT-5 Nano · Thinking (5.4)MMMU PRO66.133 / 52Tracked evidence
GPT-5 Mini · Thinking (5.0)SimpleQA9.536 / 40Tracked evidence
GPT-5 Nano · Thinking (5.0)MMMU PRO57.239 / 52Tracked evidence

Coding

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.0Aider (Polyglot)881 / 45In Quality Score
GPT-5 · GPT-5.4 ThinkingLiveCodeBench · pro87.51 / 5In Quality Score
GPT-5 · GPT-5.0LiveCodeBench · 2025_01_2025_05_single86.81 / 11In Quality Score
GPT-5 · GPT-5.2 ThinkingLiveCodeBench · v687.73 / 40In Quality Score
GPT-5 Mini · Thinking (5.0)LiveCodeBench80.43 / 69In Quality Score
GPT-5 · GPT-5.5 ThinkingGSO (Global Software Optimization) · opt_at_137.33 / 24In Quality Score
GPT-5 · GPT-5.0LiveCodeBench · v6874 / 40In Quality Score
GPT-5 Mini · Thinking (5.0)LiveCodeBench · 2025_01_2025_05_single77.44 / 11In Quality Score
GPT-5 · GPT-5.4 ThinkingGSO (Global Software Optimization) · opt_at_130.44 / 24In Quality Score
GPT-5 · GPT-5.0SWE-bench Verified · multilingual_single55.35 / 10In Quality Score
GPT-5 · GPT-5.2 ThinkingGSO (Global Software Optimization) · opt_at_126.55 / 24In Quality Score
GPT-5 · GPT-5.1GSO (Global Software Optimization) · opt_at_112.89 / 24In Quality Score
GPT-5 · GPT-5.2 ThinkingSWE-bench Verified8010 / 68In Quality Score
GPT-5 · GPT-5.0GSO (Global Software Optimization) · opt_at_15.912 / 24In Quality Score
GPT-5 Mini · Thinking (5.0)LiveCodeBench · v680.514 / 40In Quality Score
GPT-5 · GPT-5.1 Codex MaxSWE-bench Verified77.914 / 68In Quality Score
GPT-5 · GPT-5.1 ThinkingSWE-bench Verified76.323 / 68In Quality Score
GPT-5 Codex · Non-thinkingSWE-bench Verified74.528 / 68In Quality Score
GPT-5 · GPT-5.0SWE-bench Verified72.835 / 68In Quality Score
GPT-5 Mini · Thinking (5.0)SWE-bench Verified7240 / 68In Quality Score
GPT-5 · GPT-5.2 ThinkingSecCodeBench68.71 / 6Tracked evidence
GPT-5 · GPT-5.0OJ-Bench · cpp56.22 / 6Tracked evidence
GPT-5 · GPT-5.4 ThinkingNL2Repo41.32 / 9Tracked evidence
GPT-5 Mini · Thinking (5.0)Codeforces21603 / 47Tracked evidence
GPT-5 Mini · Thinking (5.0)OJ-Bench40.43 / 19Tracked evidence
GPT-5 · GPT-5.2 ThinkingSWE-bench Multilingual729 / 18Tracked evidence

Agentic

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.2 Thinkingτ²-bench · telecom98.73 / 28In Quality Score
GPT-5 · GPT-5.5 ThinkingMCP Atlas75.36 / 33In Quality Score
GPT-5 · GPT-5.2 ThinkingMCP Atlas · public_set686 / 13In Quality Score
GPT-5 · GPT-5.0τ²-bench · airline62.66 / 29In Quality Score
GPT-5 · GPT-5.5 Thinkingτ²-bench · telecom987 / 28In Quality Score
GPT-5 · GPT-5.4 ThinkingMCP Atlas · public_set67.27 / 13In Quality Score
GPT-5 · GPT-5.2 Thinkingτ²-bench · average85.58 / 30In Quality Score
GPT-5 · GPT-5.0τ²-bench · telecom96.79 / 28In Quality Score
GPT-5 · GPT-5.2 Thinkingτ²-bench · retail8210 / 34In Quality Score
GPT-5 Mini · Thinking (5.4)τ²-bench · telecom93.411 / 28In Quality Score
GPT-5 · GPT-5.4 ThinkingMCP Atlas67.211 / 33In Quality Score
GPT-5 Nano · Thinking (5.4)τ²-bench · telecom92.512 / 28In Quality Score
GPT-5 · GPT-5.1 Thinkingτ²-bench · average80.213 / 30In Quality Score
GPT-5 · GPT-5.4 Thinkingτ²-bench · telecom91.514 / 28In Quality Score
GPT-5 · GPT-5.0τ²-bench · retail81.114 / 34In Quality Score
GPT-5 Mini · Thinking (5.0)τ²-bench · telecom74.116 / 28In Quality Score
GPT-5 · GPT-5.2 ThinkingMCP Atlas60.619 / 33In Quality Score
GPT-5 · GPT-5.4τ²-bench · telecom64.320 / 28In Quality Score
GPT-5 Mini · Thinking (5.0)τ²-bench · average69.821 / 30In Quality Score
GPT-5 · GPT-5.2τ²-bench · telecom57.221 / 28In Quality Score
GPT-5 Mini · Thinking (5.4)MCP Atlas57.722 / 33In Quality Score
GPT-5 Nano · Thinking (5.4)MCP Atlas56.124 / 33In Quality Score
GPT-5 · GPT-5.1 ThinkingMCP Atlas50.126 / 33In Quality Score
GPT-5 Mini · Thinking (5.0)MCP Atlas47.627 / 33In Quality Score
GPT-5 · GPT-5.5 ThinkingGDPVal84.91 / 6Tracked evidence
GPT-5 · GPT-5.4 Thinkingτ³-Bench72.91 / 10Tracked evidence
GPT-5 · GPT-5.2 ThinkingMCPMark57.51 / 8Tracked evidence
GPT-5 · GPT-5.0FinSearchComp-T3481 / 5Tracked evidence
GPT-5 · GPT-5.2 ThinkingDeepPlanning44.61 / 16Tracked evidence
GPT-5 · GPT-5.2 ThinkingVendingBench · v239522 / 7Tracked evidence
GPT-5 · GPT-5.5 ThinkingGDPVal-AA17692 / 17Tracked evidence
GPT-5 · GPT-5.4 ThinkingGDPVal832 / 6Tracked evidence
GPT-5 · GPT-5.5 ThinkingCyberGym81.82 / 12Tracked evidence
GPT-5 · GPT-5.2 ThinkingWideSearch76.82 / 13Tracked evidence
GPT-5 · GPT-5.5 ThinkingToolathlon55.62 / 31Tracked evidence
GPT-5 · GPT-5.4 ThinkingCyberGym793 / 12Tracked evidence
GPT-5 · GPT-5.5 ThinkingOSWorld · verified78.73 / 27Tracked evidence
GPT-5 · GPT-5.4 ThinkingToolathlon54.63 / 31Tracked evidence
GPT-5 · GPT-5.5 ThinkingAutomation Bench12.93 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingGDPVal-AA16724 / 17Tracked evidence
GPT-5 · GPT-5.2 ThinkingGDPVal70.94 / 6Tracked evidence
GPT-5 · GPT-5.3 CodexToolathlon51.94 / 31Tracked evidence
GPT-5 · GPT-5.0Seal-051.44 / 16Tracked evidence
GPT-5 · GPT-5.4 ThinkingDeepSearchQA73.65 / 7Tracked evidence
GPT-5 · GPT-5.3 CodexGDPVal70.95 / 6Tracked evidence
GPT-5 · GPT-5.4 ThinkingOSWorld · verified757 / 27Tracked evidence
GPT-5 · GPT-5.3 CodexOSWorld · verified748 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingGDPVal-AA14629 / 17Tracked evidence
GPT-5 · GPT-5.2 ThinkingOSWorld38.29 / 10Tracked evidence
GPT-5 · GPT-5.2 ThinkingBFCL v463.110 / 18Tracked evidence
GPT-5 Mini · Thinking (5.0)BFCL v455.511 / 18Tracked evidence
GPT-5 · GPT-5.2 ThinkingSeal-04511 / 16Tracked evidence
GPT-5 Mini · Thinking (5.0)DeepPlanning17.911 / 16Tracked evidence
GPT-5 Mini · Thinking (5.4)OSWorld · verified72.112 / 27Tracked evidence
GPT-5 Mini · Thinking (5.0)WideSearch47.212 / 13Tracked evidence
GPT-5 · GPT-5.2 ThinkingToolathlon46.313 / 31Tracked evidence
GPT-5 Mini · Thinking (5.4)Toolathlon42.915 / 31Tracked evidence
GPT-5 Mini · Thinking (5.0)Seal-034.215 / 16Tracked evidence
GPT-5 · GPT-5.2 ThinkingOSWorld · verified47.321 / 27Tracked evidence
GPT-5 Mini · Thinking (5.0)OSWorld · verified4222 / 27Tracked evidence
GPT-5 Nano · Thinking (5.4)Toolathlon35.523 / 31Tracked evidence
GPT-5 Nano · Thinking (5.4)OSWorld · verified3924 / 27Tracked evidence
GPT-5 Mini · Thinking (5.0)Toolathlon26.926 / 31Tracked evidence

Multimodal

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.2 ThinkingScreenSpot-Pro86.31 / 24Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMVU80.81 / 20Tracked evidence
GPT-5 · GPT-5.2 ThinkingMVBench78.11 / 18Tracked evidence
GPT-5 · GPT-5.4 ThinkingZEROBench231 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingDynaMath86.82 / 23Tracked evidence
GPT-5 · GPT-5.2 ThinkingVideoMME · without_sub85.82 / 21Tracked evidence
GPT-5 · GPT-5.4 ThinkingMedXpertQA · text59.62 / 5Tracked evidence
GPT-5 · GPT-5.4 ThinkingMedXpertQA · mm77.13 / 31Tracked evidence
GPT-5 · GPT-5.2 ThinkingMotionBench64.83 / 4Tracked evidence
GPT-5 · GPT-5.2 ThinkingVideo-MMMU85.94 / 28Tracked evidence
GPT-5 · GPT-5.5 ThinkingCharXiv Reasoning84.14 / 48Tracked evidence
GPT-5 · GPT-5.4 ThinkingERQA65.44 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingWorldVQA284 / 5Tracked evidence
GPT-5 · GPT-5.2 ThinkingLVBench73.75 / 18Tracked evidence
GPT-5 · GPT-5.2 ThinkingMedXpertQA · mm73.35 / 31Tracked evidence
GPT-5 · GPT-5.2 ThinkingMLVU · mavg85.66 / 22Tracked evidence
GPT-5 · GPT-5.4 ThinkingCharXiv Reasoning82.86 / 48Tracked evidence
GPT-5 · GPT-5.2 ThinkingAI2D · test92.27 / 33Tracked evidence
GPT-5 · GPT-5.2 ThinkingRealWorldQA83.38 / 24Tracked evidence
GPT-5 · GPT-5.2 ThinkingMathVision838 / 17Tracked evidence
GPT-5 · GPT-5.2 ThinkingCharXiv Reasoning82.18 / 48Tracked evidence
GPT-5 · GPT-5.2 ThinkingSLAKE76.98 / 22Tracked evidence
GPT-5 · GPT-5.2 ThinkingZEROBench · sub33.28 / 23Tracked evidence
GPT-5 · GPT-5.2 ThinkingZEROBench98 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingVideoMME · with_sub869 / 22Tracked evidence
GPT-5 · GPT-5.2 ThinkingEmbSpatialBench81.39 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)VLMs Are Blind75.89 / 18Tracked evidence
GPT-5 · GPT-5.2 ThinkingLingoQA68.89 / 16Tracked evidence
GPT-5 · GPT-5.4 ThinkingSimpleVQA61.110 / 29Tracked evidence
GPT-5 · GPT-5.2 ThinkingERQA59.810 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingBabyVision34.410 / 22Tracked evidence
GPT-5 · GPT-5.2 ThinkingCountBench91.911 / 23Tracked evidence
GPT-5 Mini · Thinking (5.0)MLVU · mavg83.311 / 22Tracked evidence
GPT-5 Mini · Thinking (5.0)Video-MMMU82.511 / 28Tracked evidence
GPT-5 Mini · Thinking (5.0)EmbSpatialBench80.711 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)VideoMME · without_sub78.911 / 21Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMStar77.111 / 33Tracked evidence
GPT-5 Mini · Thinking (5.0)MMVU69.811 / 20Tracked evidence
GPT-5 Mini · Thinking (5.0)AI2D · test88.212 / 33Tracked evidence
GPT-5 Mini · Thinking (5.0)VideoMME · with_sub83.512 / 22Tracked evidence
GPT-5 Mini · Thinking (5.0)DynaMath81.412 / 23Tracked evidence
GPT-5 Mini · Thinking (5.0)LingoQA62.412 / 16Tracked evidence
GPT-5 Mini · Thinking (5.0)ZEROBench · sub27.312 / 23Tracked evidence
GPT-5 Mini · Thinking (5.0)CountBench9113 / 23Tracked evidence
GPT-5 · GPT-5.2 ThinkingMathVista · mini83.113 / 36Tracked evidence
GPT-5 Mini · Thinking (5.0)RealWorldQA7913 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)MathVision71.913 / 17Tracked evidence
GPT-5 Mini · Thinking (5.0)SLAKE70.513 / 22Tracked evidence
GPT-5 Mini · Thinking (5.0)ERQA5413 / 27Tracked evidence
GPT-5 · GPT-5.1 ThinkingVideo-MMMU80.414 / 28Tracked evidence
GPT-5 Nano · Thinking (5.0)LingoQA5714 / 16Tracked evidence
GPT-5 · GPT-5.2 ThinkingMMBench · en_dev_v1_188.215 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)MMStar74.115 / 33Tracked evidence
GPT-5 Nano · Thinking (5.0)VLMs Are Blind66.715 / 18Tracked evidence
GPT-5 Mini · Thinking (5.0)SimpleVQA56.815 / 29Tracked evidence
GPT-5 Mini · Thinking (5.0)BabyVision20.915 / 22Tracked evidence
GPT-5 Mini · Thinking (5.0)MMBench · en_dev_v1_186.816 / 24Tracked evidence
GPT-5 Nano · Thinking (5.0)DynaMath7816 / 23Tracked evidence
GPT-5 Nano · Thinking (5.0)MMVU63.116 / 20Tracked evidence
GPT-5 Nano · Thinking (5.0)MathVision62.216 / 17Tracked evidence
GPT-5 Nano · Thinking (5.0)ZEROBench · sub22.216 / 23Tracked evidence
GPT-5 Mini · Thinking (5.0)ZEROBench316 / 27Tracked evidence
GPT-5 · GPT-5.2 ThinkingV*75.918 / 23Tracked evidence
GPT-5 Nano · Thinking (5.0)EmbSpatialBench74.218 / 24Tracked evidence
GPT-5 · GPT-5.2 ThinkingHallusionBench65.218 / 33Tracked evidence
GPT-5 · GPT-5.2 ThinkingSimpleVQA55.818 / 29Tracked evidence
GPT-5 Nano · Thinking (5.0)RefSpatialBench12.618 / 21Tracked evidence
GPT-5 Nano · Thinking (5.0)RealWorldQA71.819 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)V*71.719 / 23Tracked evidence
GPT-5 Nano · Thinking (5.0)VideoMME · without_sub66.219 / 21Tracked evidence
GPT-5 Nano · Thinking (5.0)ERQA45.819 / 27Tracked evidence
GPT-5 Nano · Thinking (5.0)CountBench8020 / 23Tracked evidence
GPT-5 Nano · Thinking (5.0)VideoMME · with_sub71.720 / 22Tracked evidence
GPT-5 Nano · Thinking (5.0)MLVU · mavg69.220 / 22Tracked evidence
GPT-5 · GPT-5.4 ThinkingScreenSpot-Pro3920 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)RefSpatialBench920 / 21Tracked evidence
GPT-5 Nano · Thinking (5.0)ZEROBench120 / 27Tracked evidence
GPT-5 Mini · Thinking (5.0)MathVista · mini79.121 / 36Tracked evidence
GPT-5 Mini · Thinking (5.0)CharXiv Reasoning75.521 / 48Tracked evidence
GPT-5 Nano · Thinking (5.0)V*68.121 / 23Tracked evidence
GPT-5 Nano · Thinking (5.0)SLAKE5721 / 22Tracked evidence
GPT-5 Nano · Thinking (5.0)BabyVision14.421 / 22Tracked evidence
GPT-5 Nano · Thinking (5.0)MMBench · en_dev_v1_180.322 / 24Tracked evidence
GPT-5 Mini · Thinking (5.0)MedXpertQA · mm34.422 / 31Tracked evidence
GPT-5 Nano · Thinking (5.0)SimpleVQA4623 / 29Tracked evidence
GPT-5 Nano · Thinking (5.0)MMStar68.624 / 33Tracked evidence
GPT-5 Mini · Thinking (5.0)HallusionBench63.224 / 33Tracked evidence
GPT-5 Nano · Thinking (5.0)Video-MMMU6324 / 28Tracked evidence
GPT-5 · GPT-5.1 ThinkingScreenSpot-Pro3.524 / 24Tracked evidence
GPT-5 Nano · Thinking (5.0)AI2D · test81.925 / 33Tracked evidence
GPT-5 Nano · Thinking (5.0)MedXpertQA · mm26.725 / 31Tracked evidence
GPT-5 Nano · Thinking (5.0)HallusionBench58.427 / 33Tracked evidence
GPT-5 · GPT-5.1 ThinkingCharXiv Reasoning69.528 / 48Tracked evidence
GPT-5 Nano · Thinking (5.0)MathVista · mini71.530 / 36Tracked evidence
GPT-5 Nano · Thinking (5.0)CharXiv Reasoning50.143 / 48Tracked evidence

Document/OCR

Model / VariantBenchmarkScoreRankScoring
GPT-5 · GPT-5.2 ThinkingOmniDocBench · v1_50.14 / 6Tracked evidence
GPT-5 Mini · Thinking (5.0)MMLongBench-Doc50.312 / 22Tracked evidence
GPT-5 Mini · Thinking (5.0)OCRBench82.120 / 35Tracked evidence
GPT-5 Nano · Thinking (5.0)MMLongBench-Doc31.821 / 22Tracked evidence
GPT-5 · GPT-5.2 ThinkingOCRBench80.723 / 35Tracked evidence
GPT-5 Nano · Thinking (5.0)OCRBench75.330 / 35Tracked evidence

Where this family sits in the market

GPT-5 mini and nano sit on the price-efficiency frontier within the family. Full GPT-5 trades cost for headroom on the hardest workloads.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

The GPT-5 family

Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.

Closed · API only (4)

  • GPT-516 variants
  • GPT-5 Mini2 variants
  • GPT-5 Nano3 variants
  • GPT-5 Codex1 variant

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

Why this family matters

GPT-5 is OpenAI's flagship line. The decision is rarely "do we use it" (most teams already have an OpenAI key); it is which of the four tiers to run. The family is structured as a price ladder (nanomini → full → codex), and the price gap between adjacent tiers is large enough that picking the wrong one is a 5x to 25x cost mistake at production scale.

Each tier ships with multiple effort settings on the same model name, which is the part that is easy to miss: "GPT-5 mini" can mean a 5.0 routing or a 5.4 routing depending on which API knob is set, and the per-token cost roughly triples between the two. The variant table on this page flattens that ambiguity.

Which variant to start with

Default to openai-gpt-5-mini at the 5.0-thinking effort tier. On our current data it sits at Quality Score 79.3 with input pricing of $0.25 per million tokens, which puts it on the family's price-efficiency frontier. Step up to the 5.4-thinking tier of mini (QS 87.1, $0.75 input / $4.5 output per million) before you step up to full GPT-5; the per-token cost is still well under the flagship, and the score jump is large enough to absorb most of the "do I need full GPT-5?" workloads.

Reach for full GPT-5 only when you can name the workload that justifies the price gap: long-context recall over genuinely large documents, multi-step agentic plans where the ceiling matters, or evals where mini measurably underperforms. Without that named workload, you are paying for headroom you will not use.

When to deviate:

  • Coding agents: use openai-gpt-5-codex. Same headline price as full GPT-5 ($1.25 input / $10 output per million) but tuned for agentic-coding loops. The price premium over mini is the cost of the tool-use and multi-step reliability profile; reach for codex when agentic-coding throughput is the binding constraint, not when the occasional code question comes up in a chat workload.
  • High-volume chat at scale: drop to openai-gpt-5-nano ($0.05 input / $0.4 output). The score gap to mini is real (LiveBench, GPQA Diamond), but for repetitive low-stakes turns the per-token cost cut is the dominant factor in the unit economics.
  • Long-context RAG: use full openai-gpt-5. Standard pricing covers a 400K-context window (enough for most document workloads). A 1M-context premium tier is listed at $2.5 input / $15 output per million; reach for it only after measuring that recall over the upper range of your documents actually degrades answer quality.
  • You already use mini for everything: before adding full GPT-5 to the rotation, run an A/B on your specific eval rather than relying on the headline benchmark deltas. The benchmark you read about and the workload you ship are rarely the same distribution; the cheaper way to learn whether full GPT-5 earns its 5x cost on your traffic is one evening of side-by-side runs, not a procurement debate.

Where the data is weak

We aggregate benchmark scores from multiple sources but coverage across the family is uneven. Specifically:

  • The 5.4-thinking tier has thinner benchmark coverage than 5.0-thinking in our index. Several benchmarks (SWE-Bench Verified, LiveBench, Terminal-Bench, MMLU Pro) only have 5.0-thinking numbers; the 5.4-thinking figures will fill in as more eval houses re-run.
  • openai-gpt-5-codex shows a single price point and a single context window in our index. If OpenAI exposes a longer-context Codex variant through a dedicated SKU, treat the codex line on this page as covering only the standard tier until we backfill.
  • Pricing on this page is the published API list price. Many teams negotiate volume discounts that change the unit economics for full GPT-5 vs mini significantly. The price ladder framing on this page is structurally right; the absolute multipliers may compress at scale.
  • "Quality Score" combines several public benchmarks into a single comparable number. It is useful for ranking within the family and rough cross-family comparison; it is not a substitute for running your own eval on the workload you actually ship.

If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against the OpenAI docs before you commit. Pricing changes faster than our scrape cadence.

When to reach for which alternative

  • Your workload is dominated by data-sovereignty or self-hosting requirements: GPT-5 is API-only, full stop. The conversation moves to open-weights families (Qwen3, Llama, DeepSeek). Compare on the specific benchmark that matters for your workload; on broad general-purpose evals the flagship closed models still hold a lead, but the gap on hosted-API quality-per-dollar narrows once you factor self-host economics.
  • Long-form reasoning is the binding workload: check Claude Opus and DeepSeek-R1 scores on the same benchmark in our index before committing. Long chain-of-thought is the workload where ranking is most likely to flip family.
  • Cost ceiling is the binding constraint: mini and nano are already the answer within OpenAI; if even nano is too expensive, the question becomes which open-weights variant fits your hardware budget, not which OpenAI tier.

Sources worth reading

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next GPT-5 update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →