Google family
Gemma
Gemma: 4 31B IT (Thinking) ranks #34 of 186 with 262K-token context and $0.12/$0.37 per 1M tokens. Compare Gemma 4 and Gemma 3 by workload.
Top in this family
Gemma 4 31B IT (Thinking) ranks #34 of 186 on overall quality (QS 88.6) at $0.12/$0.37 per 1M tokens.
Practical pick
Gemma 4 26B A4B IT (Thinking) at $0.06/$0.33 per 1M tokens (rank #48 of 186).
- Variants
- 10
- License
- Open weights
- Provider
★ Most teams should start here
Google Gemma 4 26B A4B IT
Variant: Thinking
The practical default in the current generation. Mixture-of-experts variant: cost and serves like a small model on capable hardware while carrying near-flagship quality. Pick the dense 31B only when you have a specific deployment reason to avoid MoE.
- Quality Score
- 83.9
- Input
- $0.060/1M
- Output
- $0.330/1M
- Context
- 262K
- License
- Open weights
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| Self-host on 1 GPU | Google Gemma 4 26B A4B IT Thinking $0.060/1M / $0.330/1M | Mixture-of-experts active-param footprint is closer to a 4B model than a 26B one. Plan for the full parameter count when sizing GPU memory, not the active subset. |
| Edge / on-device | Google Gemma 4 E2B IT Thinking | Smallest efficient Gemma 4 variant. Fits CPU and edge inference for local or on-device deployment. |
| General API workhorse | Google Gemma 4 31B IT Thinking $0.120/1M / $0.370/1M | Dense Gemma 4 flagship. Use when MoE serving complexity is a problem and you want a predictable parameter-count profile. |
All variants
14 variants across 10 models (+ 1 cross-family for context). Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Thinking Gemma 4 31B IT | 88.6 #34/186 | 84.3 | 19.5 | — | — | — | — | $0.12 | $0.37 | 262K | Apr 2, 2026 | |
Thinking Gemma 4 26B A4B IT | 83.9 #48/186 | 82.3 | 8.7 | — | — | — | — | $0.06 | $0.33 | 262K | Apr 2, 2026 | |
Thinking Gemma 4 E4B IT | 69.2 #119/186 | 58.6 | — | — | — | — | — | — | — | — | Apr 2, 2026 | |
Thinking Gemma 4 E2B IT | 62.5 #147/186 | 43.4 | — | — | — | — | — | — | — | — | Apr 2, 2026 | |
Non-thinkingPrevious Gemma 3 27B IT Newer: Google Gemma 4 26B A4B IT | 58.9 #164/186 | 42.4 | — | — | 11.4 | — | 24.0 | $0.08 | $0.16 | 131K | Mar 12, 2025 | |
Non-thinkingPrevious Gemma 3 12B IT | 58.7 #165/186 | 40.9 | — | — | — | — | 18.8 | $0.04 | $0.13 | 131K | Mar 12, 2025 | |
Non-thinkingPrevious Gemma 3n 4B Newer: Google Gemma 4 E4B IT | 50.4 #175/186 | 23.7 | — | — | — | — | 11.6 | $0.06 | $0.12 | 33K | Jun 26, 2025 | |
Non-thinkingPrevious Gemma 3 4B IT Newer: Google Gemma 4 E4B IT | 49.9 #178/186 | 30.8 | — | — | — | — | — | $0.04 | $0.08 | 131K | Mar 12, 2025 | |
Non-thinkingPrevious Gemma 3n 2B IT Newer: Google Gemma 4 E2B IT | 46.9 #181/186 | 24.8 | — | — | — | — | 6.7 | — | — | — | Jun 26, 2025 | |
Non-thinkingPrevious Gemma 3 1B IT | 32.1 #186/186 | 19.2 | — | — | — | — | 0.8 | — | — | — | Mar 12, 2025 | |
V4 Pro Thinkingcross-family DeepSeek V4 | 98.0 #15/186 | 90.1 | 37.7 | 80.6 | 55.4 | 73.6 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
V4 Flash Thinkingcross-family DeepSeek V4 | 92.0 #27/186 | 88.1 | 34.8 | 79.0 | 52.6 | 69.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
V4 Procross-family DeepSeek V4 | 80.9 #61/186 | 72.9 | 7.7 | 73.6 | 52.1 | 69.4 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
V4 Flashcross-family DeepSeek V4 | 78.1 #78/186 | 71.2 | 8.1 | 73.7 | 49.1 | 64.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (59 of 123 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 3 27B IT · Non-thinking | MMLU Pro · 5_shot_cot | 67.5 | 1 / 4 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | Humanity's Last Exam · search | 26.5 | 1 / 2 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | GPQA Diamond · 5_shot_cot | 42.4 | 2 / 4 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | Humanity's Last Exam · search | 17.2 | 2 / 2 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | LiveCodeBench · v5 | 25.7 | 4 / 5 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | LiveCodeBench | 80 | 5 / 69 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | LiveCodeBench · v5 | 18.6 | 5 / 5 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | LiveCodeBench | 77.1 | 6 / 69 | In Quality Score |
Show all benchmark evidence (123 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 3 27B IT · Non-thinking | MMLU Pro · 5_shot_cot | 67.5 | 1 / 4 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | Humanity's Last Exam · search | 26.5 | 1 / 2 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | GPQA Diamond · 5_shot_cot | 42.4 | 2 / 4 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | Humanity's Last Exam · search | 17.2 | 2 / 2 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | MMLU Pro | 85.2 | 20 / 86 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | Arena Elo | 1452 | 34 / 158 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | GPQA Diamond | 84.3 | 37 / 143 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | MMLU Pro | 82.6 | 38 / 86 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | Humanity's Last Exam · hle | 19.5 | 42 / 90 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | Arena Elo | 1439 | 47 / 158 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | GPQA Diamond | 82.3 | 47 / 143 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | LiveBench | 61.6 | 56 / 110 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | AIME 2025 | 24 | 62 / 88 | In Quality Score |
| Google Gemma 4 E4B IT · Thinking | MMLU Pro | 69.4 | 65 / 86 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | Humanity's Last Exam · hle | 8.7 | 67 / 90 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | MMLU Pro | 67.5 | 68 / 86 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | AIME 2025 | 18.8 | 70 / 88 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | MMLU Pro | 60.6 | 74 / 86 | In Quality Score |
| Google Gemma 4 E2B IT · Thinking | MMLU Pro | 60 | 75 / 86 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | AIME 2025 | 11.6 | 78 / 88 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | MMLU Pro | 50.6 | 80 / 86 | In Quality Score |
| Google Gemma 3 4B IT · Non-thinking | MMLU Pro | 43.6 | 81 / 86 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | MMLU Pro | 40.5 | 83 / 86 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | AIME 2025 | 6.7 | 83 / 88 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | LiveBench | 49.2 | 86 / 110 | In Quality Score |
| Google Gemma 3 1B IT · Non-thinking | MMLU Pro | 14.7 | 86 / 86 | In Quality Score |
| Google Gemma 3 1B IT · Non-thinking | AIME 2025 | 0.8 | 88 / 88 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | LiveBench | 43.7 | 94 / 110 | In Quality Score |
| Google Gemma 4 E4B IT · Thinking | GPQA Diamond | 58.6 | 103 / 143 | In Quality Score |
| Google Gemma 3 1B IT · Non-thinking | LiveBench | 14.4 | 110 / 110 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | Arena Elo | 1366 | 113 / 158 | In Quality Score |
| Google Gemma 4 E2B IT · Thinking | GPQA Diamond | 43.4 | 124 / 143 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | GPQA Diamond | 42.4 | 125 / 143 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | Arena Elo | 1342 | 126 / 158 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | GPQA Diamond | 40.9 | 127 / 143 | In Quality Score |
| Google Gemma 3 4B IT · Non-thinking | GPQA Diamond | 30.8 | 135 / 143 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | Arena Elo | 1318 | 137 / 158 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | GPQA Diamond | 24.8 | 140 / 143 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | GPQA Diamond | 23.7 | 141 / 143 | In Quality Score |
| Google Gemma 3 1B IT · Non-thinking | GPQA Diamond | 19.2 | 143 / 143 | In Quality Score |
| Google Gemma 3 4B IT · Non-thinking | Arena Elo | 1303 | 144 / 158 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | GSM8K | 95.9 | 1 / 10 | Tracked evidence |
| Google Gemma 4 31B IT · Thinking | AIME 2026 · no_tools | 89.2 | 1 / 4 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | GSM8K | 94.4 | 2 / 10 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | AIME 2026 · no_tools | 88.3 | 2 / 4 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | AIME 2026 · no_tools | 42.5 | 3 / 4 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | AIME 2026 · no_tools | 37.5 | 4 / 4 | Tracked evidence |
| Google Gemma 3 4B IT · Non-thinking | GSM8K | 89.2 | 7 / 10 | Tracked evidence |
| Google Gemma 4 31B IT · Thinking | MMMLU | 88.4 | 9 / 38 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | Multi-IF | 69.8 | 9 / 32 | Tracked evidence |
| Google Gemma 4 31B IT · Thinking | MRCR · v2_128k | 66.4 | 9 / 23 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | GSM8K | 62.8 | 10 / 10 | Tracked evidence |
| Google Gemma 4 31B IT · Thinking | MMMU PRO | 76.9 | 12 / 52 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | Arena-Hard | 86.8 | 13 / 40 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | MMMLU | 86.3 | 14 / 38 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | Multi-IF | 65.6 | 14 / 32 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | Arena-Hard | 82.6 | 19 / 40 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | MRCR · v2_128k | 44.1 | 19 / 23 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MMMU · mmmu_single | 50 | 21 / 22 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | MRCR · v2_128k | 25.4 | 22 / 23 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | MRCR · v2_128k | 19.1 | 23 / 23 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | MMMU PRO | 73.8 | 26 / 52 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | MMMLU | 76.6 | 27 / 38 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | MMLU | 76.9 | 29 / 33 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | MATH 500 | 90 | 30 / 55 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | MMMLU | 67.4 | 31 / 38 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | BFCL v3 | 59.1 | 31 / 49 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | Multi-IF | 32.8 | 31 / 32 | Tracked evidence |
| Google Gemma 3n 4B · Non-thinking | MMLU | 64.9 | 32 / 33 | Tracked evidence |
| Google Gemma 3n 2B IT · Non-thinking | MMLU | 60.1 | 33 / 33 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | SimpleQA | 10 | 35 / 40 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MATH 500 | 85.6 | 36 / 55 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | SimpleQA | 6.3 | 37 / 40 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | Arena-Hard | 17.8 | 38 / 40 | Tracked evidence |
| Google Gemma 3 4B IT · Non-thinking | SimpleQA | 4 | 39 / 40 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | SimpleQA | 2.2 | 40 / 40 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | MMMU PRO | 52.6 | 41 / 52 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | BFCL v3 | 50.6 | 41 / 49 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | MMMU PRO | 48.4 | 45 / 52 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | AIME 2024 | 32.6 | 46 / 69 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | MMMU PRO | 44.2 | 48 / 52 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | BFCL v3 | 16.3 | 49 / 49 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | MATH 500 | 46.4 | 55 / 55 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | AIME 2024 | 22.4 | 55 / 69 | Tracked evidence |
| Google Gemma 3 1B IT · Non-thinking | AIME 2024 | 0.9 | 69 / 69 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 3n 4B · Non-thinking | LiveCodeBench · v5 | 25.7 | 4 / 5 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | LiveCodeBench | 80 | 5 / 69 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | LiveCodeBench · v5 | 18.6 | 5 / 5 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | LiveCodeBench | 77.1 | 6 / 69 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | LiveCodeBench · 2024_10_01_to_2025_02_01 | 29.7 | 7 / 9 | In Quality Score |
| Google Gemma 4 E4B IT · Thinking | LiveCodeBench | 52 | 33 / 69 | In Quality Score |
| Google Gemma 4 E2B IT · Thinking | LiveCodeBench | 44 | 36 / 69 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | Aider (Polyglot) | 4.9 | 44 / 45 | In Quality Score |
| Google Gemma 3 27B IT · Non-thinking | LiveCodeBench | 29.7 | 48 / 69 | In Quality Score |
| Google Gemma 3 12B IT · Non-thinking | LiveCodeBench | 24.6 | 55 / 69 | In Quality Score |
| Google Gemma 3n 2B IT · Non-thinking | LiveCodeBench | 13.2 | 60 / 69 | In Quality Score |
| Google Gemma 3n 4B · Non-thinking | LiveCodeBench | 13.2 | 61 / 69 | In Quality Score |
| Google Gemma 3 4B IT · Non-thinking | LiveCodeBench | 12.6 | 62 / 69 | In Quality Score |
| Google Gemma 3 1B IT · Non-thinking | LiveCodeBench | 1.9 | 69 / 69 | In Quality Score |
| Google Gemma 4 31B IT · Thinking | Codeforces | 2150 | 5 / 47 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | Codeforces | 1718 | 20 / 47 | Tracked evidence |
| Google Gemma 3 27B IT · Non-thinking | Codeforces | 1063 | 33 / 47 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | Codeforces | 940 | 36 / 47 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | Codeforces | 633 | 44 / 47 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | Codeforces | 462 | 46 / 47 | Tracked evidence |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 4 31B IT · Thinking | τ²-bench · average | 76.9 | 20 / 30 | In Quality Score |
| Google Gemma 4 26B A4B IT · Thinking | τ²-bench · average | 68.2 | 22 / 30 | In Quality Score |
| Google Gemma 4 E4B IT · Thinking | τ²-bench · average | 42.2 | 27 / 30 | In Quality Score |
| Google Gemma 4 E2B IT · Thinking | τ²-bench · average | 24.5 | 29 / 30 | In Quality Score |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 3 27B IT · Non-thinking | ChartQA | 76.3 | 8 / 9 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MathVision · mini | 31.9 | 8 / 10 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | ChartQA · test | 39 | 9 / 10 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MathVerse · mini | 29.8 | 10 / 10 | Tracked evidence |
| Google Gemma 4 31B IT · Thinking | MedXpertQA · mm | 61.3 | 14 / 31 | Tracked evidence |
| Google Gemma 4 26B A4B IT · Thinking | MedXpertQA · mm | 58.1 | 15 / 31 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | HallusionBench | 65.3 | 16 / 33 | Tracked evidence |
| Google Gemma 4 E4B IT · Thinking | MedXpertQA · mm | 28.7 | 23 / 31 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | AI2D · test | 80.4 | 28 / 33 | Tracked evidence |
| Google Gemma 4 E2B IT · Thinking | MedXpertQA · mm | 23.5 | 28 / 31 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MMStar | 59.4 | 30 / 33 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | MathVista · mini | 57.4 | 35 / 36 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Google Gemma 3 27B IT · Non-thinking | DocVQA | 90.4 | 6 / 8 | Tracked evidence |
| Google Gemma 3 12B IT · Non-thinking | OCRBench | 75.3 | 29 / 35 | Tracked evidence |
Where this family sits in the market
Gemma's e2b and e4b efficiency variants take the cost-efficiency frontier for small-footprint self-hosting. The 26B-A4B MoE sits at the family's quality-per-active-param sweet spot.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Self-hosting
These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.
- Google Gemma 4 31B ITThinking · open weights
- Google Gemma 4 26B A4B ITThinking · open weights
- Google Gemma 4 E4B ITThinking · open weights
- Google Gemma 4 E2B ITThinking · open weights
- Google Gemma 3 27B ITNon-thinking · open weights
- Google Gemma 3 12B ITNon-thinking · open weights
- Google Gemma 3 4B ITNon-thinking · open weights
- Google Gemma 3 1B ITNon-thinking · open weights
- Google Gemma 3n 4BNon-thinking · open weights
- Google Gemma 3n 2B ITNon-thinking · open weights
The Gemma family
Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.
Open weights (10)
- Google Gemma 4 31B IT1 variant
- Google Gemma 4 26B A4B IT1 variant
- Google Gemma 4 E4B IT1 variant
- Google Gemma 4 E2B IT1 variant
- Google Gemma 3 27B IT1 variant
- Google Gemma 3 12B IT1 variant
- Google Gemma 3 4B IT1 variant
- Google Gemma 3 1B IT1 variant
- Google Gemma 3n 4B1 variant
- Google Gemma 3n 2B IT1 variant
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- Llama: Muse Spark (Thinking), Llama 4 and 3 Compared
Llama: Muse Spark (Thinking) ranks #12 of 186 on Quality Score. Compare Llama 4, Llama 3, and Muse Spark by self-hosting and workload.
- Qwen3: Qwen 3.7 Max Preview, Qwen3.5, Qwen3.6 Compared
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.
Caveats
What this page does not tell you, listed honestly.
- No tracked API pricing for: Google Gemma 4 E4B IT, Google Gemma 4 E2B IT, Google Gemma 3 1B IT, Google Gemma 3n 2B IT. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
- Context window not declared for: Google Gemma 4 E4B IT, Google Gemma 4 E2B IT, Google Gemma 3 1B IT, Google Gemma 3n 2B IT.
- Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.
Editor's notes
Why this family matters
Gemma is Google's open-weights line, distinct from the closed Gemini API family. The current generation (Gemma 4) brings a meaningful architectural change: alongside the dense 31B variant, the family now ships a 26B-A4B mixture-of-experts build (26B total parameters, ~4B active per token), plus efficiency-tuned e2b and e4b variants sized for edge and on-device deployment.
The structural pull onto Gemma is self-host fit. The 26B-A4B MoE costs and serves like a small model on capable hardware while landing at Quality Score 83.9 (#48 of 186 models we track), which is competitive with dense models 3 to 5 times its active-parameter footprint. The dense 31B at QS 88.6 is the family's quality ceiling; the e2b and e4b variants are the cost-efficiency frontier for constrained hardware.
Which variant to start with
Default to google-gemma-4-26b-a4b-it. At $0.06 input / $0.33 output
per million on hosted routes (or proportional self-host cost) and Quality
Score 83.9 with a 262K context window, it is the family's
quality-per-active-param sweet spot. Pick this unless you have a
specific reason not to.
When to deviate:
- Maximum quality within the family: use
google-gemma-4-31b-it(QS 88.6, dense 31B). The score gap to 26B-A4B is modest (3 points of Quality Score), but for deployments that benefit from dense inference characteristics or where MoE serving is awkward, the dense variant is the cleaner option. - Self-host on a single GPU: the choice depends on what you are optimising for. The 26B-A4B MoE has the active-param footprint of a 4B model but the full weights still need to fit in memory; plan for the full 26B parameter count when sizing GPU memory, not the active subset. The dense 31B is heavier on memory but simpler to serve. Below those, the e4b variant is the realistic single-GPU consumer-hardware target in the family.
- Edge or on-device: drop to
google-gemma-4-e2b-it(QS 62.5, efficiency-tuned). Smallest Gemma 4 variant; fits CPU and small-footprint inference where round-trip latency rules out hosted APIs. Treat its score as directional, not comparative against the larger variants. - You are migrating an existing Gemma 3 deployment: the Gemma 3 line is still in our index, but the 27B at QS 58.9 (and the smaller variants below it) lag the Gemma 4 generation by enough that the migration cost is almost certainly earned by the score uplift, particularly if the deployment is recent enough that fine-tunes or pinned weights are not the constraint.
Where the data is weak
We aggregate benchmark scores from multiple sources but coverage and naming across this family deserve a careful read. Specifically:
- Gemma 3 vs Gemma 4 is a generation gap, not a minor version bump. Gemma 4 31B at QS 88.6 vs Gemma 3 27B at QS 58.9 is a difference of nearly 30 Quality Score points on otherwise comparable parameter counts. Do not collapse "Gemma 27B" and "Gemma 31B" into one mental category; they are different generations.
- e2b and e4b efficiency variants have thinner benchmark coverage than the dense and MoE variants. Several benchmarks (LiveBench, AIME, SWE-Bench) carry numbers for 31B / 26B-A4B but not for the efficiency variants at last verification. Treat their listed scores as directional, particularly outside Quality Score and Arena ELO.
- Context windows on the e-variants are not declared in our index. Gemma 4 31B and 26B-A4B both ship at 262K; the e4b and e2b context fields are unset. That is a coverage gap, not evidence of a smaller window.
- Hosted-vs-self-host pricing. The pricing on this page is what hosted inference providers charge for Gemma, not the cost of self-hosting (which depends on your own hardware and utilisation). For an open-weights family, list price is the calibration anchor for cross-family comparison; the deployment cost question is on you.
- The Gemma 3n line. The 3n variants (e2b, e4b) appear alongside Gemma 3 in our index but are an efficiency-tuned derivative, not a drop-in for Gemma 3. Their scores are weaker than the larger Gemma 3 variants and substantially weaker than Gemma 4 e-variants; they exist for niche on-device use cases, not general workhorse deployment.
If you are making a procurement decision (or, more often for this family, a deployment-architecture decision), the variant table on this page is the load-bearing artifact. Cross-check the open-weights license terms against your use case before you commit; Gemma's licence is permissive but specific.
When to reach for which alternative
- Open-weights breadth across model sizes: the Qwen3 family ships dense models from 0.6B to 32B plus MoE variants, which gives a wider spread than Gemma's current lineup, particularly at the smaller end. Pick by which family's smallest deployable variant fits your hardware budget and licence requirements.
- Cheapest competent open-weights API: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1 and 1M context) is the price anchor to beat at the chat workhorse tier. Gemma 4 26B-A4B competes on the open-weights story; DeepSeek wins on hosted-API quality-per-dollar at the chat tier.
- You need a closed Google option for the same workload: the Gemini 3 surface in our index covers Google's current closed line. Gemma is not a drop-in for Gemini in either direction; the licence and deployment story differ enough that the surfaces are best evaluated separately, not as a tier-ladder.
Sources worth reading
- Gemma model docs: authoritative variant identifiers, license terms, and intended use
- Gemma on Hugging Face: model cards, weights, and per-variant license information
- Google Gemma announcements: release notes for new generations
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Gemma update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →