Google family
Gemini 2
Gemini 2.5 Flash ships at $0.30/$2.50 per 1M with 1M-token context. When 2.5 Pro and the 2.0 family beat upgrading to Gemini 3 on cost or workload.
Top in this family
Gemini 2.5 Pro ranks #68 of 186 on overall quality (QS 79.9) at $1.25/$10 per 1M tokens.
Practical pick
Gemini 2.5 Flash (Thinking) at $0.3/$2.5 per 1M tokens (rank #112 of 186).
- Variants
- 6
- License
- Closed weights
- Provider
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| General API workhorse | Google Gemini 2.5 Flash Thinking $0.300/1M / $2.50/1M | Previous-generation default for chat-and-tooling workloads. Choose when the cost delta to Gemini 3 Flash is the deciding factor. |
| High-volume chat | gemini-2.5-flash-lite-preview-09-2025 Preview 09 2025 No Thinking $0.100/1M / $0.400/1M | Cheapest 2.5-tier option at usable quality. Use for high-volume chat where per-token cost compounds. |
All variants
20 variants across 6 models (+ 2 cross-family for context). Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | Terminal | Tau | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gemini 2.5 ProPrevious Gemini 2.5 Pro Newer: Gemini 3 Pro | 79.9 #68/186 | 86.4 | 21.6 | 59.6 | — | 32.6 | 67.0 | 8.8 | 88.0 | $1.25 | $10 | — | Jun 17, 2025 | |
ThinkingPrevious Gemini 2.5 Flash Newer: Gemini 3 Flash | 71.1 #112/186 | 82.8 | 11.0 | 60.4 | — | 16.9 | — | 3.4 | 72.0 | $0.3 | $2.5 | 1.0M | Jun 17, 2025 | |
Non-thinkingPrevious Gemini 2.0 Pro Newer: Gemini 3 Pro | 70.1 #116/186 | 64.7 | — | — | — | — | — | — | — | — | — | — | Feb 5, 2025 | |
LatestPrevious gemini-2.5-flash-lite-preview-09-2025 Newer: Gemini 3.1 Flash Lite | 68.5 #122/186 | 66.7 | 6.9 | — | — | — | — | — | — | $0.1 | $0.4 | — | Sep 25, 2025 | |
Non-ThinkingPrevious Gemini 2.5 Flash Newer: Gemini 3 Flash | 67.6 #126/186 | 68.2 | — | — | — | — | 64.3 | — | 46.6 | $0.3 | $2.5 | 1.0M | Jun 17, 2025 | |
Non-thinkingPrevious Gemini 2.0 Flash-Lite Newer: Gemini 3.1 Flash Lite | 63.5 #141/186 | 51.5 | — | — | — | — | — | — | — | $0.075 | $0.3 | — | Feb 5, 2025 | |
2.0Previous Gemini 2.0 Flash Newer: Gemini 3 Flash | 63.0 #144/186 | 60.1 | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Feb 5, 2025 | |
ReasoningPrevious Gemini 2.0 Flash Newer: Gemini 3 Flash | 61.4 #150/186 | — | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Feb 5, 2025 | |
Max ThinkingPrevious Gemini 2.5 Flash Newer: Gemini 3 Flash | — | — | — | — | — | — | — | — | — | $0.3 | $2.5 | 1.0M | Jun 17, 2025 | |
Max Thinking 2025 06 17Previous gemini-2.5-flash-lite-preview-09-2025 Newer: Gemini 3.1 Flash Lite | — | — | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Sep 25, 2025 | |
Max Thinking 2025 09 25Previous gemini-2.5-flash-lite-preview-09-2025 Newer: Gemini 3.1 Flash Lite | — | — | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Sep 25, 2025 | |
Preview 06 17 ThinkingPrevious gemini-2.5-flash-lite-preview-09-2025 Newer: Gemini 3.1 Flash Lite | — | — | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Sep 25, 2025 | |
Preview 09 2025 No ThinkingPrevious gemini-2.5-flash-lite-preview-09-2025 Newer: Gemini 3.1 Flash Lite | — | — | — | — | — | — | — | — | — | $0.1 | $0.4 | — | Sep 25, 2025 | |
Preview 01 01Previous Gemini 2.0 Flash Newer: Gemini 3 Flash | — | — | 6.6 | — | — | — | — | — | — | $0.1 | $0.4 | — | Feb 5, 2025 | |
V4 Pro Thinkingcross-family DeepSeek V4 | 98.0 #15/186 | 90.1 | 37.7 | 80.6 | 55.4 | — | — | 73.6 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
V4 Flash Thinkingcross-family DeepSeek V4 | 92.0 #27/186 | 88.1 | 34.8 | 79.0 | 52.6 | — | — | 69.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
3.0cross-family Gemini 3 Flash | 88.9 #32/186 | 90.4 | 33.7 | 78.0 | 49.6 | 47.6 | — | 62.0 | — | $0.5 | $3 | — | Dec 17, 2025 | |
Previewcross-family Gemini 3 Flash | 87.3 #36/186 | — | — | — | — | — | — | 62.0 | — | $0.5 | $3 | — | Dec 17, 2025 | |
V4 Procross-family DeepSeek V4 | 80.9 #61/186 | 72.9 | 7.7 | 73.6 | 52.1 | — | — | 69.4 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
V4 Flashcross-family DeepSeek V4 | 78.1 #78/186 | 71.2 | 8.1 | 73.7 | 49.1 | — | — | 64.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (72 of 165 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveBench | 82.4 | 1 / 110 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_08_2025_05 | 77.1 | 1 / 17 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_single | 75.6 | 1 / 2 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_07_2025_01 | 80.1 | 2 / 8 | In Quality Score |
| Google Gemini 2.0 Pro · Non-thinking | LiveCodeBench · 2024_10_01_to_2025_02_01 | 36 | 2 / 9 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Aider (Polyglot) | 83.1 | 3 / 45 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | LiveCodeBench · 2024_10_01_to_2025_02_01 | 34.5 | 3 / 9 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | AIME 2025 · code_exec | 75.7 | 4 / 4 | In Quality Score |
Show all benchmark evidence (165 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveBench | 82.4 | 1 / 110 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | AIME 2025 · code_exec | 75.7 | 4 / 4 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | SimpleBench | 62.4 | 9 / 61 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | AIME 2025 · no_tools | 88 | 10 / 15 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | AIME 2025 · no_tools | 72 | 13 / 15 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | AIME 2025 | 88 | 16 / 88 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MMLU Pro | 86 | 17 / 86 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Humanity's Last Exam · hle_text | 18.4 | 21 / 56 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | GPQA Diamond | 86.4 | 28 / 143 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | Humanity's Last Exam · hle_text | 12.6 | 28 / 56 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | SimpleBench | 41.2 | 33 / 61 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | AIME 2025 | 72 | 39 / 88 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Humanity's Last Exam · hle | 21.6 | 39 / 90 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Arena Elo | 1446 | 41 / 158 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | LiveBench | 67.8 | 43 / 110 | In Quality Score |
| Google Gemini 2.0 Flash · Reasoning | SimpleBench | 30.7 | 43 / 61 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | GPQA Diamond | 82.8 | 44 / 143 | In Quality Score |
| Google Gemini 2.0 Flash · Reasoning | Humanity's Last Exam · hle_text | 6.5 | 44 / 56 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | Humanity's Last Exam · hle_text | 5.6 | 49 / 56 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | MMLU Pro | 79.4 | 52 / 86 | In Quality Score |
| Google Gemini 2.0 Pro · Non-thinking | MMLU Pro | 79.1 | 53 / 86 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | AIME 2025 | 46.6 | 53 / 88 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | MMLU Pro | 77.6 | 55 / 86 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | SimpleBench | 18.9 | 58 / 61 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | Humanity's Last Exam · hle | 11 | 60 / 90 | In Quality Score |
| Gemini 2.0 Flash-Lite · Non-thinking | MMLU Pro | 71.6 | 63 / 86 | In Quality Score |
| Google Gemini 2.5 Flash · Max Thinking | LiveBench | 53.1 | 77 / 110 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | Arena Elo | 1411 | 80 / 158 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | Humanity's Last Exam · hle | 6.9 | 80 / 90 | In Quality Score |
| Google Gemini 2.0 Flash · Preview 01 01 | Humanity's Last Exam · hle | 6.6 | 83 / 90 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | GPQA Diamond | 68.2 | 84 / 143 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | GPQA Diamond | 66.7 | 88 / 143 | In Quality Score |
| Google Gemini 2.0 Pro · Non-thinking | GPQA Diamond | 64.7 | 94 / 143 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Max Thinking 2025 06 17 | LiveBench | 42.6 | 95 / 110 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Max Thinking 2025 09 25 | LiveBench | 42.4 | 96 / 110 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | GPQA Diamond | 60.1 | 99 / 143 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Preview 09 2025 No Thinking | Arena Elo | 1380 | 106 / 158 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Preview 06 17 Thinking | Arena Elo | 1375 | 107 / 158 | In Quality Score |
| Gemini 2.0 Flash-Lite · Non-thinking | GPQA Diamond | 51.5 | 111 / 143 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | Arena Elo | 1360 | 117 / 158 | In Quality Score |
| Gemini 2.0 Flash-Lite · Non-thinking | Arena Elo | 1353 | 121 / 158 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Arena-Hard | 96.4 | 1 / 40 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v1_average | 93 | 1 / 1 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v1_pointwise | 82.9 | 1 / 1 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Multi-IF | 77.8 | 1 / 32 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v2_average | 58 | 1 / 6 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v2_pointwise | 16.4 | 1 / 1 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MATH 500 | 98.8 | 2 / 55 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | GlobalPIQA | 91.5 | 2 / 4 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | FACTS Benchmark Suite | 63.4 | 2 / 12 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | VendingBench2 | 573.6 | 4 / 4 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | AIME 2024 | 92 | 4 / 69 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Global PIQA | 91.5 | 4 / 26 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MMMU · mmmu_single | 79.6 | 4 / 22 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | SimpleQA | 54.5 | 4 / 40 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MMMLU | 89.5 | 5 / 38 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | AceBench | 74.5 | 5 / 7 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | FACTS Benchmark Suite | 50.4 | 5 / 12 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | Global PIQA | 90.2 | 6 / 26 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MathArenaApex | 0.5 | 7 / 8 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | MRCR · v2_1m | 21 | 9 / 14 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | MMLU | 90.1 | 10 / 33 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v2_1m | 16.4 | 10 / 14 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | FACTS Benchmark Suite | 17.9 | 12 / 12 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MMLU | 89.5 | 13 / 33 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | MMMLU | 86.6 | 13 / 38 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MRCR · v2_128k | 58 | 13 / 23 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | AIME 2024 | 82.3 | 14 / 69 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MRCR · v2_1m | 5.4 | 14 / 14 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | SciCode | 42.8 | 15 / 24 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | FinanceAgent | 29.4 | 15 / 15 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | MRCR · v2_128k | 54.3 | 16 / 23 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | SimpleQA | 28.1 | 17 / 40 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMMLU | 84.5 | 19 / 38 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MRCR · v2_128k | 30.6 | 21 / 23 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | SimpleQA | 23.3 | 22 / 40 | Tracked evidence |
| Google Gemini 2.0 Pro · Non-thinking | MATH 500 | 91.8 | 25 / 55 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | HMMT Feb 2025 | 64.2 | 27 / 44 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | BFCL v3 | 62.9 | 27 / 49 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MMMU PRO | 68 | 30 / 52 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | MMMU PRO | 66.7 | 31 / 52 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | SimpleQA | 11.5 | 32 / 40 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | AIME 2024 | 61.3 | 33 / 69 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | HMMT Feb 2025 | 34.7 | 37 / 44 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMMU PRO | 51 | 42 / 52 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | BrowseComp | 7.6 | 44 / 51 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_08_2025_05 | 77.1 | 1 / 17 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_single | 75.6 | 1 / 2 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2024_07_2025_01 | 80.1 | 2 / 8 | In Quality Score |
| Google Gemini 2.0 Pro · Non-thinking | LiveCodeBench · 2024_10_01_to_2025_02_01 | 36 | 2 / 9 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Aider (Polyglot) | 83.1 | 3 / 45 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | LiveCodeBench · 2024_10_01_to_2025_02_01 | 34.5 | 3 / 9 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | SWE-bench Verified · single_agentless | 32.6 | 7 / 7 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | LiveCodeBench · 2024_08_2025_05 | 62.3 | 8 / 17 | In Quality Score |
| Gemini 2.0 Flash-Lite · Non-thinking | LiveCodeBench · 2024_10_01_to_2025_02_01 | 28.9 | 8 / 9 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench · 2025_01_2025_05_single | 69 | 9 / 11 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | SWE-bench Verified · multiple | 67.2 | 9 / 10 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | LiveCodeBench | 70.4 | 13 / 69 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | LiveCodeBench | 62.6 | 20 / 69 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | Aider (Polyglot) | 55.1 | 22 / 45 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | GSO (Global Software Optimization) · opt_at_1 | 0 | 23 / 24 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | Aider (Polyglot) | 44 | 32 / 45 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | LiveCodeBench · v6 | 44.7 | 34 / 40 | In Quality Score |
| Google Gemini 2.0 Flash · 2.0 | Aider (Polyglot) | 22.2 | 38 / 45 | In Quality Score |
| Google Gemini 2.0 Flash · Reasoning | Aider (Polyglot) | 18.2 | 40 / 45 | In Quality Score |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | LiveCodeBench | 34.3 | 41 / 69 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | SWE-bench Verified | 60.4 | 52 / 68 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | SWE-bench Verified | 59.6 | 53 / 68 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Codeforces | 2001 | 11 / 47 | Tracked evidence |
| Google Gemini 2.5 Flash · Non-Thinking | OJ-Bench | 19.5 | 16 / 19 | Tracked evidence |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Flash · Thinking | τ²-bench · average | 79.5 | 15 / 30 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | τ²-bench · average | 77.8 | 19 / 30 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | τ²-bench · airline | 50 | 21 / 29 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | τ²-bench · airline | 42.5 | 25 / 29 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | τ²-bench · retail | 67 | 26 / 34 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | τ²-bench · retail | 64.3 | 28 / 34 | In Quality Score |
| Google Gemini 2.5 Flash · Non-Thinking | τ²-bench · telecom | 16.9 | 28 / 28 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | MCP Atlas | 8.8 | 32 / 33 | In Quality Score |
| Google Gemini 2.5 Flash · Thinking | MCP Atlas | 3.4 | 33 / 33 | In Quality Score |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | VendingBench · v2 | 574 | 6 / 7 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | VendingBench · v2 | 549 | 7 / 7 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Toolathlon | 10.5 | 30 / 31 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | Toolathlon | 3.7 | 31 / 31 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | VideoMME | 86.9 | 2 / 4 | Tracked evidence |
| Google Gemini 2.0 Flash · 2.0 | ChartQA | 88.3 | 3 / 9 | Tracked evidence |
| Gemini 2.0 Flash-Lite · Non-thinking | ChartQA | 73 | 9 / 9 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | Video-MMMU | 83.6 | 10 / 28 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | LVBench | 60.9 | 12 / 18 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | VLMs Are Blind | 68.4 | 14 / 18 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMVU | 65.3 | 14 / 20 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | LingoQA | 17.8 | 15 / 16 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MLVU · mavg | 78.5 | 16 / 22 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | VideoMME · without_sub | 72.7 | 16 / 21 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | AI2D · test | 85.7 | 17 / 33 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | Video-MMMU | 79.2 | 17 / 28 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | SLAKE | 65 | 17 / 22 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MathVision | 52.1 | 17 / 17 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | ZEROBench · sub | 19.2 | 17 / 23 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | RealWorldQA | 72.2 | 18 / 24 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMBench · en_dev_v1_1 | 82.7 | 19 / 24 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | VideoMME · with_sub | 74.6 | 19 / 22 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | DynaMath | 69.9 | 19 / 23 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | BabyVision | 17.5 | 19 / 22 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | RefSpatialBench | 11.2 | 19 / 21 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | ZEROBench | 1 | 19 / 27 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | V* | 69.6 | 20 / 23 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | HallusionBench | 64.5 | 20 / 33 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | SimpleVQA | 54.1 | 20 / 29 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | CountBench | 79.2 | 21 / 23 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MedXpertQA · mm | 35.3 | 21 / 31 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | EmbSpatialBench | 66.1 | 22 / 24 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | ERQA | 44.3 | 22 / 27 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | ScreenSpot-Pro | 11.4 | 22 / 24 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMStar | 69.1 | 23 / 33 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | ScreenSpot-Pro | 3.9 | 23 / 24 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | Video-MMMU | 60.7 | 26 / 28 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | CharXiv Reasoning | 69.6 | 27 / 48 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MathVista · mini | 72.8 | 29 / 36 | Tracked evidence |
| Google Gemini 2.5 Flash · Thinking | CharXiv Reasoning | 63.7 | 33 / 48 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | CharXiv Reasoning | 55.5 | 39 / 48 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemini 2.5 Flash · Thinking | OmniDocBench · v1_5 | 0.2 | 1 / 6 | Tracked evidence |
| Google Gemini 2.5 Pro · Gemini 2.5 Pro | OmniDocBench · v1_5 | 0.1 | 3 / 6 | Tracked evidence |
| Gemini 2.0 Flash-Lite · Non-thinking | DocVQA | 91.2 | 5 / 8 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | MMLongBench-Doc | 46.5 | 14 / 22 | Tracked evidence |
| gemini-2.5-flash-lite-preview-09-2025 · Latest | OCRBench | 82.5 | 19 / 35 | Tracked evidence |
Where this family sits in the market
Gemini 2.5 Flash Lite and 2.0 Flash Lite take the family's cost-efficiency frontier across all served Gemini tiers.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- GPT-4 Era: GPT-4o, GPT-4.1, o-series, gpt-oss Picks vs GPT-5
OpenAI's pre-GPT-5 lineup still served: GPT-4o, GPT-4.1, o-series reasoning, and gpt-oss. When a legacy tier still beats upgrading.
- Claude 3.5 vs Claude 4: When the Older Sonnet and Haiku Still Fit
Claude 3.5 Sonnet still ships at $3/$15 per 1M, the same price as Sonnet 4. When the cost-equal Claude 4 tier wins, when 3.5 still earns its slot.
Caveats
What this page does not tell you, listed honestly.
- No tracked API pricing for: Google Gemini 2.0 Pro. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
- Context window not declared for: Google Gemini 2.5 Pro, gemini-2.5-flash-lite-preview-09-2025, Google Gemini 2.0 Pro, Google Gemini 2.0 Flash, Gemini 2.0 Flash-Lite.
- Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.
Editor's notes
If you are already on Gemini 2
If you have a working Vertex or AI Studio deployment pinned to a Gemini 2-era SKU, the question is when staying is defensible and where the data is thin enough to verify before committing.
The one fact that complicates the headline migration call: Gemini 2.5 Flash sits at $0.3 input / $2.5 output per million with a verified 1M-token context window. Gemini 3 Flash is more expensive on both axes ($0.5 / $3) and our index has a coverage gap on its context window. The trade is a Quality Score lift from 67.6 (2.5 Flash thinking) to 88.9 (3 Flash) for higher unit cost and an unverified context profile. For workloads that picked 2.5 Flash specifically for cheap long-context, the migration is not a unit-economics win. Verify the workload tolerates the quality gap before paying more per call.
Reasons to stay on the previous generation that are defensible:
- Vertex routing or fine-tunes pinned to 2.5. If a deployment is going through Vertex with model-specific routing, fine-tunes, or enterprise SLAs tied to a 2.5 SKU, the migration cost includes the Vertex-side work. Plan it; do not assume the SDK is the only thing to update.
- 2.0 Flash is genuinely cheap and your workload is tolerant. At $0.1 input / $0.4 output per million with 1M context, 2.0 Flash (and its reasoning preview variants) sit at one of the cheapest 1M-context price points in our index. Quality Score around 63 puts the line well below 3 Flash, but for repetitive low-stakes turns with long context the cost-per-call advantage compounds.
- 2.5 Pro is in production and your evals qualified it. 2.5 Pro (thinking mode) at QS 79.9 with Arena ELO rank 38 is not a bad model; it just sits below 3 Pro 3.1's tier. If your workload is qualified on 2.5 Pro output behaviour, treat the migration to 3 Pro as a re-qualification exercise, not a drop-in.
Where the data is weak
- Context window declarations are partial. Several 2-era variants list a 1M context window in our index (2.5 Flash, 2.0 Flash preview), while others (2.5 Pro, 2.5 Flash Lite, 2.0 Pro) show the field as unset. That is a coverage gap; verify on the deployment surface you actually use before committing for a long-document workload.
- The 2.5 Flash Lite preview SKUs have multiple variants
(2025-06-17, 2025-09-25) with similar pricing but distinct
behaviour. Pin the variant identifier you are calling explicitly
rather than relying on
latest-style aliases for production. - Pricing on this page is the published list price. Vertex AI routing, batch pricing, and enterprise agreements change the unit economics; list price is a calibration anchor.
When to look outside this era
- Gemini 3 family (
/en/ai/llm/gemini-3) is the natural successor for every tier on this page. If the migration question is still open, that surface is the comparison to read. - Cheapest competent long-context API outside Google: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1 and 1M context) is the price anchor to beat at the workhorse-with-long-context tier. Gemini 2.5 Flash already loses to it on both quality and pricing.
Sources worth reading
- Google AI Studio pricing: vendor price list (Gemini 2 and Gemini 3 tiers listed together)
- Gemini API model docs: variant identifiers and deprecation timelines
- Vertex AI Gemini docs: Vertex-specific routing and pricing
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Gemini 2 update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →