Google family
Gemini 3
Gemini 3: Gemini 3.1 Pro ranks #5 of 186 with $2/$12 per 1M tokens. Compare Gemini 3 Pro, Flash, and Lite by workload.
Top in this family
Gemini 3.1 Pro ranks #5 of 186 on overall quality (QS 104.3) at $2/$12 per 1M tokens.
Practical pick
Gemini 3 Flash (Preview) at $0.5/$3 per 1M tokens (rank #36 of 186).
- Variants
- 3
- License
- Closed weights
- Provider
★ Most teams should start here
Gemini 3 Flash
Variant: Preview
The practical default. Carries Gemini 3's quality ceiling for everyday API workloads at a fraction of Pro pricing. Step up to 3 Pro only when the workload visibly benefits.
- Quality Score
- 87.3
- Input
- $0.500/1M
- Output
- $3.00/1M
- Context
- —
- License
- Closed · API
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| General API workhorse | Gemini 3 Flash Preview $0.500/1M / $3.00/1M | Best quality-per-dollar in the family for chat, summarization, and tool-augmented assistants. |
| Long-context RAG | Gemini 3 Pro 3.1 $2.00/1M / $12.00/1M | Strongest long-context recall in the family. Pick when document scale and faithful retrieval over long inputs dominate. |
| High-volume chat | Gemini 3.1 Flash Lite Latest $0.250/1M / $1.50/1M | Cheapest production-grade tier in the current generation. Use for high-volume chat where per-token cost compounds. |
All variants
20 variants across 3 models (+ 2 cross-family for context). Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | Terminal | Tau | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3.1 Gemini 3 Pro | 104.3 #5/186 | 94.3 | 44.4 | 80.6 | 54.2 | 68.5 | 90.8 | 73.9 | — | $2 | $12 | — | Nov 18, 2025 | |
3.0 Gemini 3 Pro | 95.0 #20/186 | 91.9 | 37.5 | 76.2 | 43.3 | 54.2 | 85.3 | 54.1 | 95.0 | $2 | $12 | — | Nov 18, 2025 | |
3.0 Gemini 3 Flash | 88.9 #32/186 | 90.4 | 33.7 | 78.0 | 49.6 | 47.6 | — | 62.0 | — | $0.5 | $3 | — | Dec 17, 2025 | |
Preview Gemini 3 Flash | 87.3 #36/186 | — | — | — | — | — | — | 62.0 | — | $0.5 | $3 | — | Dec 17, 2025 | |
Latest Gemini 3.1 Flash Lite | 81.1 #59/186 | 86.9 | 16.0 | — | — | — | — | 57.1 | — | $0.25 | $1.5 | — | — | |
4.8 Thinkingcross-family Anthropic Claude Opus 4 | 108.6 #2/186 | 93.6 | 49.8 | 88.6 | 69.2 | — | — | 82.2 | — | $5 | $25 | 200K | May 22, 2025 | |
4.7 Thinkingcross-family Anthropic Claude Opus 4 | 107.8 #3/186 | 94.2 | 46.9 | 87.6 | 64.3 | 69.4 | — | 77.3 | — | $5 | $25 | 200K | May 22, 2025 | |
4.6 Thinkingcross-family Anthropic Claude Opus 4 | 104.1 #6/186 | 91.3 | 40.0 | 80.8 | 53.4 | 65.4 | 91.9 | 59.5 | 95.6 | $5 | $25 | 1.0M | May 22, 2025 | |
4.5 Thinkingcross-family Anthropic Claude Opus 4 | 98.6 #13/186 | 87.0 | 30.8 | 80.9 | — | 59.3 | 88.9 | 62.3 | 92.8 | $5 | $25 | 200K | May 22, 2025 | |
V4 Pro Thinkingcross-family DeepSeek V4 | 98.0 #15/186 | 90.1 | 37.7 | 80.6 | 55.4 | — | — | 73.6 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
4.6 Non-thinkingcross-family Anthropic Claude Opus 4 | 93.1 #23/186 | — | 19.0 | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
V4 Flash Thinkingcross-family DeepSeek V4 | 92.0 #27/186 | 88.1 | 34.8 | 79.0 | 52.6 | — | — | 69.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
4.1 Thinkingcross-family Anthropic Claude Opus 4 | 83.1 #50/186 | 81.0 | 11.7 | 74.5 | — | 38.0 | 86.8 | 40.9 | 78.0 | $15 | $75 | 200K | May 22, 2025 | |
V4 Procross-family DeepSeek V4 | 80.9 #61/186 | 72.9 | 7.7 | 73.6 | 52.1 | — | — | 69.4 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
4.5 Non-thinkingcross-family Anthropic Claude Opus 4 | 80.7 #63/186 | — | 14.2 | — | 45.9 | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
4.0 Thinkingcross-family Anthropic Claude Opus 4 | 80.7 #64/186 | 79.6 | 10.7 | 72.5 | — | — | 81.4 | — | 75.5 | $15 | $75 | 200K | May 22, 2025 | |
4.0 Non-thinkingcross-family Anthropic Claude Opus 4 | 79.1 #73/186 | 74.9 | 6.7 | 72.5 | — | — | 81.8 | — | 33.9 | $15 | $75 | 200K | May 22, 2025 | |
V4 Flashcross-family DeepSeek V4 | 78.1 #78/186 | 71.2 | 8.1 | 73.7 | 49.1 | — | — | 64.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
4.1 Non-thinkingcross-family Anthropic Claude Opus 4 | 70.4 #115/186 | — | 7.9 | — | — | — | — | — | — | $15 | $75 | 200K | May 22, 2025 | |
4.7 Non-thinkingcross-family Anthropic Claude Opus 4 | — | — | — | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (58 of 186 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | LiveCodeBench · v6 | 90.7 | 1 / 40 | In Quality Score |
| Gemini 3 Pro · 3.0 | MMLU Pro | 90.1 | 1 / 86 | In Quality Score |
| Gemini 3 Pro · 3.1 | SimpleBench | 79.6 | 1 / 61 | In Quality Score |
| Gemini 3 Pro · 3.0 | τ²-bench · airline | 73 | 1 / 29 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · verified | 48 | 1 / 5 | In Quality Score |
| Gemini 3 Pro · 3.1 | Humanity's Last Exam · hle_text | 47.3 | 1 / 56 | In Quality Score |
| Gemini 3 Pro · 3.0 | AIME 2025 · code_exec | 100 | 2 / 4 | In Quality Score |
| Gemini 3 Pro · 3.1 | τ²-bench · telecom | 99.3 | 2 / 28 | In Quality Score |
Show all benchmark evidence (186 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | MMLU Pro | 90.1 | 1 / 86 | In Quality Score |
| Gemini 3 Pro · 3.1 | SimpleBench | 79.6 | 1 / 61 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · verified | 48 | 1 / 5 | In Quality Score |
| Gemini 3 Pro · 3.1 | Humanity's Last Exam · hle_text | 47.3 | 1 / 56 | In Quality Score |
| Gemini 3 Pro · 3.0 | AIME 2025 · code_exec | 100 | 2 / 4 | In Quality Score |
| Gemini 3 Flash · 3.0 | AIME 2025 · no_tools | 95.2 | 2 / 15 | In Quality Score |
| Gemini 3 Pro · 3.1 | GPQA Diamond | 94.3 | 2 / 143 | In Quality Score |
| Gemini 3 Pro · 3.1 | Humanity's Last Exam · search_code | 51.4 | 2 / 6 | In Quality Score |
| Gemini 3 Flash · 3.0 | AIME 2025 · code_exec | 99.7 | 3 / 4 | In Quality Score |
| Gemini 3 Pro · 3.0 | AIME 2025 | 95 | 3 / 88 | In Quality Score |
| Gemini 3 Pro · 3.0 | AIME 2025 · no_tools | 95 | 3 / 15 | In Quality Score |
| Gemini 3 Pro · 3.0 | SimpleBench | 76.4 | 3 / 61 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · hle_text | 37.5 | 3 / 56 | In Quality Score |
| Gemini 3 Pro · 3.1 | LiveBench | 79.9 | 4 / 110 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · search_code | 45.8 | 4 / 6 | In Quality Score |
| Gemini 3 Pro · 3.1 | Humanity's Last Exam · hle | 44.4 | 4 / 90 | In Quality Score |
| Gemini 3 Pro · 3.1 | Arena Elo | 1487 | 6 / 158 | In Quality Score |
| Gemini 3 Flash · 3.0 | Humanity's Last Exam · search_code | 43.5 | 6 / 6 | In Quality Score |
| Gemini 3 Pro · 3.0 | Arena Elo | 1486 | 7 / 158 | In Quality Score |
| Gemini 3 Pro · 3.0 | GPQA Diamond | 91.9 | 9 / 143 | In Quality Score |
| Gemini 3 Pro · 3.1 | Humanity's Last Exam · tools | 51.4 | 10 / 38 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · hle | 37.5 | 11 / 90 | In Quality Score |
| Gemini 3 Flash · 3.0 | GPQA Diamond | 90.4 | 12 / 143 | In Quality Score |
| Gemini 3 Flash · 3.0 | SimpleBench | 61.1 | 12 / 61 | In Quality Score |
| Gemini 3 Flash · 3.0 | Humanity's Last Exam · hle | 33.7 | 14 / 90 | In Quality Score |
| Gemini 3 Flash · 3.0 | Arena Elo | 1473 | 17 / 158 | In Quality Score |
| Gemini 3 Pro · 3.0 | Humanity's Last Exam · tools | 45.8 | 21 / 38 | In Quality Score |
| Gemini 3 Pro · 3.0 | LiveBench | 73.4 | 23 / 110 | In Quality Score |
| Gemini 3 Flash · Preview | Arena Elo | 1461 | 25 / 158 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | GPQA Diamond | 86.9 | 26 / 143 | In Quality Score |
| Gemini 3 Flash · Preview | LiveBench | 72.4 | 26 / 110 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | Humanity's Last Exam · hle_text | 8.0 | 35 / 56 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | Humanity's Last Exam · hle | 16 | 51 / 90 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | Arena Elo | 1433 | 52 / 158 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | LiveBench | 61.7 | 55 / 110 | In Quality Score |
| Gemini 3 Pro · 3.0 | VendingBench2 | 5478.2 | 1 / 4 | Tracked evidence |
| Gemini 3 Pro · 3.0 | Global PIQA | 93.4 | 1 / 26 | Tracked evidence |
| Gemini 3 Pro · 3.0 | GlobalPIQA | 93.4 | 1 / 4 | Tracked evidence |
| Gemini 3 Flash · 3.0 | MMMLU | 91.8 | 1 / 38 | Tracked evidence |
| Gemini 3 Pro · 3.1 | BrowseComp · context_manage | 85.9 | 1 / 15 | Tracked evidence |
| Gemini 3 Pro · 3.0 | WMT24++ | 80.7 | 1 / 6 | Tracked evidence |
| Gemini 3 Pro · 3.0 | SimpleQA | 72.1 | 1 / 40 | Tracked evidence |
| Gemini 3 Pro · 3.0 | FACTS Benchmark Suite | 70.5 | 1 / 12 | Tracked evidence |
| Gemini 3 Pro · 3.1 | SciCode | 59 | 1 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.1 | AIME 2026 | 98.2 | 2 / 19 | Tracked evidence |
| Gemini 3 Flash · 3.0 | Global PIQA | 92.8 | 2 / 26 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MMLU | 92.6 | 2 / 33 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMMLU | 91.8 | 2 / 38 | Tracked evidence |
| Gemini 3 Pro · 3.1 | IPhO 2025 (Theory) | 87.7 | 2 / 3 | Tracked evidence |
| Gemini 3 Pro · 3.1 | BrowseComp | 85.9 | 2 / 51 | Tracked evidence |
| Gemini 3 Flash · 3.0 | MMMU PRO | 81.2 | 2 / 52 | Tracked evidence |
| Gemini 3 Flash · 3.0 | SimpleQA | 68.7 | 2 / 40 | Tracked evidence |
| Gemini 3 Pro · 3.0 | SciCode | 56 | 2 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.0 | HMMT Feb 2025 | 97.3 | 3 / 44 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMLU | 91.8 | 3 / 33 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MRCR · v2_128k | 84.9 | 3 / 23 | Tracked evidence |
| Gemini 3 Flash · 3.0 | FACTS Benchmark Suite | 61.9 | 3 / 12 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MathArenaApex | 23.4 | 3 / 8 | Tracked evidence |
| Gemini 3 Pro · 3.1 | HealthBench · hard | 20.6 | 3 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.1 | Frontier Science Research | 23.3 | 4 / 4 | Tracked evidence |
| Gemini 3 Pro · 3.1 | HMMT Nov 2025 | 94.8 | 5 / 31 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MAXIFE | 87.5 | 5 / 21 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMMU PRO | 81 | 5 / 52 | Tracked evidence |
| Gemini 3 Pro · 3.1 | FinanceAgent | 59.7 | 5 / 15 | Tracked evidence |
| Gemini 3 Pro · 3.1 | FrontierMath · tier1_3 | 36.9 | 5 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.1 | FrontierMath · tier4 | 16.7 | 5 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.1 | HMMT Feb 2026 | 87.3 | 6 / 16 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MMMU PRO | 80.5 | 6 / 52 | Tracked evidence |
| Gemini 3 Pro · 3.0 | BrowseComp_zh | 66.8 | 6 / 20 | Tracked evidence |
| Gemini 3 Pro · 3.1 | FinanceAgent · v2 | 43 | 6 / 7 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MRCR · v2_1m | 26.3 | 6 / 14 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | MMMLU | 88.9 | 7 / 38 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MRCR · v2_128k | 77 | 7 / 23 | Tracked evidence |
| Gemini 3 Flash · 3.0 | FinanceAgent · v2 | 42.6 | 7 / 7 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MRCR · v2_1m | 26.3 | 7 / 14 | Tracked evidence |
| Gemini 3 Flash · 3.0 | MRCR · v2_128k | 67.2 | 8 / 23 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | SimpleQA | 43.3 | 8 / 40 | Tracked evidence |
| Gemini 3 Flash · 3.0 | MRCR · v2_1m | 22.1 | 8 / 14 | Tracked evidence |
| Gemini 3 Pro · 3.0 | IMO AnswerBench | 83.3 | 9 / 28 | Tracked evidence |
| Gemini 3 Pro · 3.0 | IFBench | 70 | 9 / 28 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | FACTS Benchmark Suite | 40.6 | 9 / 12 | Tracked evidence |
| Gemini 3 Pro · 3.0 | HMMT Nov 2025 | 93 | 10 / 31 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | MRCR · v2_128k | 60.1 | 11 / 23 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | MRCR · v2_1m | 12.3 | 11 / 14 | Tracked evidence |
| Gemini 3 Pro · 3.1 | IMO AnswerBench | 81 | 13 / 28 | Tracked evidence |
| Gemini 3 Pro · 3.0 | BrowseComp · context_manage | 59.2 | 13 / 15 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | MMMU PRO | 76.8 | 14 / 52 | Tracked evidence |
| Gemini 3 Pro · 3.0 | FinanceAgent | 44.1 | 14 / 15 | Tracked evidence |
| Gemini 3 Pro · 3.0 | AIME 2026 | 90.6 | 18 / 19 | Tracked evidence |
| Gemini 3 Pro · 3.0 | BrowseComp | 37.8 | 34 / 51 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | LiveCodeBench · v6 | 90.7 | 1 / 40 | In Quality Score |
| Gemini 3 Pro · 3.1 | LiveCodeBench · pro | 82.9 | 2 / 5 | In Quality Score |
| Gemini 3 Pro · 3.1 | SWE-bench Verified | 80.6 | 7 / 68 | In Quality Score |
| Gemini 3 Pro · 3.1 | GSO (Global Software Optimization) · opt_at_1 | 21.6 | 7 / 24 | In Quality Score |
| Gemini 3 Pro · 3.0 | GSO (Global Software Optimization) · opt_at_1 | 17.6 | 8 / 24 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | LiveCodeBench | 72 | 10 / 69 | In Quality Score |
| Gemini 3 Flash · 3.0 | GSO (Global Software Optimization) · opt_at_1 | 7.8 | 11 / 24 | In Quality Score |
| Gemini 3 Flash · 3.0 | SWE-bench Verified | 78 | 13 / 68 | In Quality Score |
| Gemini 3 Pro · 3.0 | SWE-bench Verified | 76.2 | 24 / 68 | In Quality Score |
| Gemini 3 Pro · 3.0 | SecCodeBench | 62.4 | 4 / 6 | Tracked evidence |
| Gemini 3 Pro · 3.1 | NL2Repo | 33.4 | 7 / 9 | Tracked evidence |
| Gemini 3 Pro · 3.0 | SWE-bench Multilingual | 65 | 18 / 18 | Tracked evidence |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | τ²-bench · airline | 73 | 1 / 29 | In Quality Score |
| Gemini 3 Pro · 3.1 | τ²-bench · telecom | 99.3 | 2 / 28 | In Quality Score |
| Gemini 3 Pro · 3.0 | τ²-bench · average | 90.7 | 2 / 30 | In Quality Score |
| Gemini 3 Pro · 3.1 | τ²-bench · retail | 90.8 | 3 / 34 | In Quality Score |
| Gemini 3 Flash · 3.0 | τ²-bench · average | 90.2 | 3 / 30 | In Quality Score |
| Gemini 3 Pro · 3.1 | MCP Atlas · public_set | 69.2 | 4 / 13 | In Quality Score |
| Gemini 3 Pro · 3.0 | τ²-bench · telecom | 98 | 6 / 28 | In Quality Score |
| Gemini 3 Pro · 3.0 | τ²-bench · retail | 85.3 | 7 / 34 | In Quality Score |
| Gemini 3 Pro · 3.1 | MCP Atlas | 73.9 | 7 / 33 | In Quality Score |
| Gemini 3 Pro · 3.0 | MCP Atlas · public_set | 66.6 | 8 / 13 | In Quality Score |
| Gemini 3 Flash · Preview | MCP Atlas | 62 | 16 / 33 | In Quality Score |
| Gemini 3 Flash · 3.0 | MCP Atlas | 62 | 17 / 33 | In Quality Score |
| Gemini 3.1 Flash Lite · Latest | MCP Atlas | 57.1 | 23 / 33 | In Quality Score |
| Gemini 3 Pro · 3.0 | MCP Atlas | 54.1 | 25 / 33 | In Quality Score |
| Gemini 3 Pro · 3.0 | VendingBench · v2 | 5478 | 1 / 7 | Tracked evidence |
| Gemini 3 Pro · 3.0 | FinSearchComp · t2_t3 | 49.9 | 2 / 2 | Tracked evidence |
| Gemini 3 Pro · 3.0 | BFCL v4 | 72.5 | 3 / 18 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MCPMark | 53.9 | 3 / 8 | Tracked evidence |
| Gemini 3 Flash · 3.0 | VendingBench · v2 | 3635 | 4 / 7 | Tracked evidence |
| Gemini 3 Pro · 3.1 | Automation Bench | 9.6 | 5 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.1 | OSWorld · verified | 76.2 | 6 / 27 | Tracked evidence |
| Gemini 3 Pro · 3.1 | DeepSearchQA | 69.7 | 6 / 7 | Tracked evidence |
| Gemini 3 Pro · 3.1 | GDPVal | 67.3 | 6 / 6 | Tracked evidence |
| Gemini 3 Flash · 3.0 | Toolathlon | 49.4 | 7 / 31 | Tracked evidence |
| Gemini 3 Pro · 3.0 | DeepPlanning | 23.3 | 7 / 16 | Tracked evidence |
| Gemini 3 Pro · 3.1 | Toolathlon | 48.8 | 8 / 31 | Tracked evidence |
| Gemini 3 Pro · 3.1 | τ³-Bench | 67.1 | 9 / 10 | Tracked evidence |
| Gemini 3 Pro · 3.0 | Seal-0 | 45.5 | 9 / 16 | Tracked evidence |
| Gemini 3 Pro · 3.0 | CyberGym | 39.9 | 10 / 12 | Tracked evidence |
| Gemini 3 Pro · 3.0 | WideSearch | 57 | 11 / 13 | Tracked evidence |
| Gemini 3 Pro · 3.1 | GDPVal-AA | 1314 | 13 / 17 | Tracked evidence |
| Gemini 3 Flash · 3.0 | OSWorld · verified | 65.1 | 14 / 27 | Tracked evidence |
| Gemini 3 Flash · 3.0 | GDPVal-AA | 1204 | 15 / 17 | Tracked evidence |
| Gemini 3 Pro · 3.0 | GDPVal-AA | 1201 | 16 / 17 | Tracked evidence |
| Gemini 3 Pro · 3.0 | Toolathlon | 36.4 | 22 / 31 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | AI2D · test | 94.1 | 1 / 33 | Tracked evidence |
| Gemini 3 Pro · 3.0 | VideoMME · with_sub | 88.4 | 1 / 22 | Tracked evidence |
| Gemini 3 Pro · 3.0 | VideoMME · without_sub | 87.7 | 1 / 21 | Tracked evidence |
| Gemini 3 Pro · 3.0 | Video-MMMU | 87.6 | 1 / 28 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MedXpertQA · mm | 81.3 | 1 / 31 | Tracked evidence |
| Gemini 3 Pro · 3.0 | LVBench | 76.2 | 1 / 18 | Tracked evidence |
| Gemini 3 Pro · 3.0 | SimpleVQA | 73.2 | 1 / 29 | Tracked evidence |
| Gemini 3 Pro · 3.1 | MedXpertQA · text | 71.5 | 1 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.0 | ERQA | 70.5 | 1 / 27 | Tracked evidence |
| Gemini 3 Pro · 3.0 | WorldVQA | 47.4 | 1 / 5 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMBench · en_dev_v1_1 | 93.7 | 2 / 24 | Tracked evidence |
| Gemini 3 Flash · 3.0 | Video-MMMU | 86.9 | 2 / 28 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMStar | 83.1 | 2 / 33 | Tracked evidence |
| Gemini 3 Pro · 3.0 | ScreenSpot-Pro | 72.7 | 2 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.1 | SimpleVQA | 72.4 | 2 / 29 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MotionBench | 70.3 | 2 / 4 | Tracked evidence |
| Gemini 3 Pro · 3.1 | ERQA | 69.4 | 2 / 27 | Tracked evidence |
| Gemini 3 Pro · 3.0 | BabyVision | 49.7 | 2 / 22 | Tracked evidence |
| Gemini 3 Pro · 3.0 | ZEROBench · sub | 39 | 2 / 23 | Tracked evidence |
| Gemini 3 Pro · 3.1 | ZEROBench | 19 | 2 / 27 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MathVista · mini | 87.9 | 3 / 36 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MathVision | 86.6 | 3 / 17 | Tracked evidence |
| Gemini 3 Pro · 3.0 | SLAKE | 81.3 | 3 / 22 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MMVU | 77.5 | 3 / 20 | Tracked evidence |
| Gemini 3 Pro · 3.0 | ODinW · 13 | 46.3 | 3 / 13 | Tracked evidence |
| Gemini 3 Pro · 3.0 | CountBench | 97.3 | 4 / 23 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MedXpertQA · mm | 76 | 4 / 31 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | Video-MMMU | 84.8 | 5 / 28 | Tracked evidence |
| Gemini 3 Pro · 3.1 | CharXiv Reasoning | 83.3 | 5 / 48 | Tracked evidence |
| Gemini 3 Pro · 3.0 | ZEROBench | 10 | 5 / 27 | Tracked evidence |
| Gemini 3 Pro · 3.0 | DynaMath | 85.1 | 6 / 23 | Tracked evidence |
| Gemini 3 Flash · 3.0 | ScreenSpot-Pro | 69.1 | 6 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.0 | RefSpatialBench | 65.5 | 6 / 21 | Tracked evidence |
| Gemini 3 Pro · 3.0 | RealWorldQA | 83.3 | 7 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.0 | LingoQA | 72.8 | 8 / 16 | Tracked evidence |
| Gemini 3 Pro · 3.0 | CharXiv Reasoning | 81.4 | 9 / 48 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MVBench | 74.1 | 10 / 18 | Tracked evidence |
| Gemini 3 Pro · 3.0 | HallusionBench | 68.6 | 11 / 33 | Tracked evidence |
| Gemini 3 Pro · 3.1 | ScreenSpot-Pro | 61 | 11 / 24 | Tracked evidence |
| Gemini 3 Pro · 3.0 | V* | 88 | 12 / 23 | Tracked evidence |
| Gemini 3 Pro · 3.0 | MLVU · mavg | 83 | 12 / 22 | Tracked evidence |
| Gemini 3 Flash · 3.0 | CharXiv Reasoning | 80.3 | 12 / 48 | Tracked evidence |
| Gemini 3 Pro · 3.0 | RefCOCO · avg | 84.1 | 16 / 18 | Tracked evidence |
| Gemini 3.1 Flash Lite · Latest | CharXiv Reasoning | 73.2 | 22 / 48 | Tracked evidence |
| Gemini 3 Pro · 3.0 | EmbSpatialBench | 61.2 | 23 / 24 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Gemini 3 Pro · 3.0 | MMLongBench-Doc | 60.5 | 3 / 22 | Tracked evidence |
| Gemini 3 Pro · 3.0 | OCRBench | 90.4 | 5 / 35 | Tracked evidence |
| Gemini 3 Flash · 3.0 | OmniDocBench · v1_5 | 0.1 | 5 / 6 | Tracked evidence |
| Gemini 3 Pro · 3.0 | OmniDocBench · v1_5 | 0.1 | 6 / 6 | Tracked evidence |
Where this family sits in the market
Gemini 3 Flash and 3.1 Flash Lite take the price-efficiency frontier within the family. 3 Pro trades cost for headroom on the hardest workloads.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
The Gemini 3 family
Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.
Closed · API only (3)
- Gemini 3 Pro2 variants
- Gemini 3 Flash2 variants
- Gemini 3.1 Flash Lite1 variant
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- GPT-5: GPT-5.5 Thinking, Mini, Nano, Codex Compared
GPT-5: GPT-5.5 Thinking ranks #4 of 186 with 400K-token context and $1.25/$10 per 1M tokens. Compare GPT-5, Mini, Nano, and Codex by workload.
- Claude: Opus 4.8 (Thinking), Opus, Sonnet, Haiku Compared
Claude: Opus 4.8 (Thinking) ranks #2 of 186 on Quality Score. Compare Opus, Sonnet, Haiku, and Mythos by price, benchmarks, and workload.
Caveats
What this page does not tell you, listed honestly.
- Context window not declared for: Gemini 3 Pro, Gemini 3 Flash, Gemini 3.1 Flash Lite.
- Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.
Editor's notes
Why this family matters
Gemini 3 is Google's current generation, structured as a three-tier ladder (Pro, Flash, Flash Lite). Two facts pull most teams onto this family. First, the per-token pricing is aggressive against the closed-flagship field: Gemini 3 Flash lists at $0.5 input / $3 output per million tokens and Gemini 3.1 Flash Lite at $0.25 / $1.5, which sits below the equivalent tiers on most competitor pricing tables we track. Second, Gemini 3 Pro 3.1 lands at Quality Score 104.3 (#5 of 186 models in our index), which puts the flagship inside the same cluster as the closed-frontier tier, not below it.
The combination is unusual. Most families are either cheap-and-mid or expensive-and-frontier; Gemini 3 ships both ends of the ladder simultaneously, which makes "which tier" the entire decision.
Which variant to start with
Default to google-gemini-3-flash. At Quality Score 88.9 (#32 of 186) and $0.5 input / $3 output, it is the family's price-quality sweet
spot for chat, summarization, and tool-augmented assistants. For most
teams shipping API-backed product features, this is the practical
default.
Step up to google-gemini-3-pro when the workload visibly benefits
from the additional headroom. The 3.1 variant lands at QS 104.3 against
Flash's 88.9, which is a meaningful jump on the hardest reasoning,
coding, and multi-step planning evals (GPQA Diamond 94.3, LiveBench
79.93). The price gap is roughly 4x on input ($2 vs $0.5) and 4x on
output ($12 vs $3), so the workload needs to be one where the score
delta translates to a measurable product win.
Drop to google-gemini-3-1-flash-lite for high-volume chat at
scale. Quality Score 81.1 vs Flash's 88.9 is a real gap, but at
$0.25 /
$1.5the per-token saving compounds quickly on workloads dominated by repetitive low-stakes turns.
When to deviate:
- Long-context RAG: use
google-gemini-3-pro. The 3.0 variant in our index ships with a 1M-token context window at the same headline pricing. Reach for it when document scale and faithful retrieval over long inputs dominate the workload. - Hardest-tier reasoning workloads: Gemini 3 Pro 3.1 is competitive at the very top of our reasoning leaderboards (GPQA Diamond rank 3, LiveBench rank 4). Run a side-by-side against your closed-flagship alternative on the specific reasoning benchmark that matters; on the numbers we have, the cost-quality trade-off is genuinely close.
- You already use Flash for everything: before adding Pro to the rotation, run an A/B on your specific eval. The score gap on benchmarks is real; whether it shows up on your traffic distribution is the question worth answering with data, not vibes.
Where the data is weak
We aggregate benchmark scores from multiple sources but coverage is uneven across this family. Specifically:
- Gemini 3 Pro has two minor versions in our index (3.0 and 3.1) with substantially different scores. Pro 3.0 sits at QS 95.0; Pro 3.1 at QS 104.3. When the article quotes a number, it is for the specific minor version named; do not collapse the line to a single Pro score.
- Context windows are partially declared. Our index lists the 1M-token window only on Gemini 3 Pro 3.0; Pro 3.1, Flash, and Flash Lite show the field as unset. That is a coverage gap, not evidence of a smaller window. Verify the limit on the deployment surface you actually use before committing on a long-document workload.
- Flash Lite has thinner benchmark coverage than the other tiers. Several benchmarks (SWE-Bench Verified, AIME 2025) carry numbers for Flash and Pro but not Flash Lite at last verification. Treat its scores as directional outside the headline benchmarks (Quality Score, Arena ELO, LiveBench, GPQA Diamond).
- Pricing on this page is the published API list price. Vertex AI routing, batch pricing, and enterprise agreements can change the unit economics. List price is a calibration anchor, not the cost ceiling.
If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against Google's own docs (and Vertex AI routing if you go through that path) before you commit.
When to reach for which alternative
- Open-weights deployment is a requirement: Gemini is API-only. The conversation moves to open-weights families (Qwen3, DeepSeek). On the cost-per-quality axis at the chat workhorse tier, DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1) is the price anchor to beat.
- Closed-flagship reasoning at the absolute top end: Claude Opus 4.7-thinking lands at QS 107.8 in our index, slightly above Gemini 3 Pro 3.1 (QS 104.3). On any specific benchmark the ranking can flip; compare on the workload that matters before treating the headline QS difference as decisive.
- Previous-generation Gemini is already in production: the Gemini 2 line is on the sibling gemini-2 surface in our index. For some workloads the migration cost to 3 may not be earned by the score delta, particularly if Vertex routing or fine-tunes are tied to the older line.
Sources worth reading
- Google AI Studio pricing: authoritative price list for the Gemini API
- Gemini API model docs: variant identifiers, context windows, modality coverage
- Google AI for Developers blog: release notes for new generations and pricing changes
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Gemini 3 update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →