Google family

Gemma

Gemma: 4 31B IT (Thinking) ranks #34 of 186 with 262K-token context and $0.12/$0.37 per 1M tokens. Compare Gemma 4 and Gemma 3 by workload.

Top in this family

Gemma 4 31B IT (Thinking) ranks #34 of 186 on overall quality (QS 88.6) at $0.12/$0.37 per 1M tokens.

Practical pick

Gemma 4 26B A4B IT (Thinking) at $0.06/$0.33 per 1M tokens (rank #48 of 186).

Variants: 10
License: Open weights
Provider: Google

★ Most teams should start here

Google Gemma 4 26B A4B IT

Variant: Thinking

The practical default in the current generation. Mixture-of-experts variant: cost and serves like a small model on capable hardware while carrying near-flagship quality. Pick the dense 31B only when you have a specific deployment reason to avoid MoE.

Quality Score: 83.9
Input: $0.060/1M
Output: $0.330/1M
Context: 262K
License: Open weights

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.

Workload	Best pick	Why
Self-host on 1 GPU	Google Gemma 4 26B A4B IT Thinking $0.060/1M / $0.330/1M	Mixture-of-experts active-param footprint is closer to a 4B model than a 26B one. Plan for the full parameter count when sizing GPU memory, not the active subset.
Edge / on-device	Google Gemma 4 E2B IT Thinking	Smallest efficient Gemma 4 variant. Fits CPU and edge inference for local or on-device deployment.
General API workhorse	Google Gemma 4 31B IT Thinking $0.120/1M / $0.370/1M	Dense Gemma 4 flagship. Use when MoE serving complexity is a problem and you want a predictable parameter-count profile.

All variants

14 variants across 10 models (+ 1 cross-family for context). Sorted by quality score (descending).

Variant	QS	GPQA	HLE	SWE	SWE-Pro	MCP	AIME	In $/M	Out $/M	Context	Released
Thinking Gemma 4 31B IT	88.6 #34/186	84.3	19.5	—	—	—	—	$0.12	$0.37	262K	Apr 2, 2026
Thinking Gemma 4 26B A4B IT	83.9 #48/186	82.3	8.7	—	—	—	—	$0.06	$0.33	262K	Apr 2, 2026
Thinking Gemma 4 E4B IT	69.2 #119/186	58.6	—	—	—	—	—	—	—	—	Apr 2, 2026
Thinking Gemma 4 E2B IT	62.5 #147/186	43.4	—	—	—	—	—	—	—	—	Apr 2, 2026
Non-thinkingPrevious Gemma 3 27B IT Newer: Google Gemma 4 26B A4B IT	58.9 #164/186	42.4	—	—	11.4	—	24.0	$0.08	$0.16	131K	Mar 12, 2025
Non-thinkingPrevious Gemma 3 12B IT	58.7 #165/186	40.9	—	—	—	—	18.8	$0.04	$0.13	131K	Mar 12, 2025
Non-thinkingPrevious Gemma 3n 4B Newer: Google Gemma 4 E4B IT	50.4 #175/186	23.7	—	—	—	—	11.6	$0.06	$0.12	33K	Jun 26, 2025
Non-thinkingPrevious Gemma 3 4B IT Newer: Google Gemma 4 E4B IT	49.9 #178/186	30.8	—	—	—	—	—	$0.04	$0.08	131K	Mar 12, 2025
Non-thinkingPrevious Gemma 3n 2B IT Newer: Google Gemma 4 E2B IT	46.9 #181/186	24.8	—	—	—	—	6.7	—	—	—	Jun 26, 2025
Non-thinkingPrevious Gemma 3 1B IT	32.1 #186/186	19.2	—	—	—	—	0.8	—	—	—	Mar 12, 2025
V4 Pro Thinkingcross-family DeepSeek V4	98.0 #15/186	90.1	37.7	80.6	55.4	73.6	—	$0.435	$0.87	1.0M	Apr 24, 2026
V4 Flash Thinkingcross-family DeepSeek V4	92.0 #27/186	88.1	34.8	79.0	52.6	69.0	—	$0.098	$0.197	1.0M	Apr 24, 2026
V4 Procross-family DeepSeek V4	80.9 #61/186	72.9	7.7	73.6	52.1	69.4	—	$0.435	$0.87	1.0M	Apr 24, 2026
V4 Flashcross-family DeepSeek V4	78.1 #78/186	71.2	8.1	73.7	49.1	64.0	—	$0.098	$0.197	1.0M	Apr 24, 2026

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (59 of 123 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 3 27B IT · Non-thinking	MMLU Pro · 5_shot_cot	67.5	1 / 4	In Quality Score
Google Gemma 4 31B IT · Thinking	Humanity's Last Exam · search	26.5	1 / 2	In Quality Score
Google Gemma 3 27B IT · Non-thinking	GPQA Diamond · 5_shot_cot	42.4	2 / 4	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	Humanity's Last Exam · search	17.2	2 / 2	In Quality Score
Google Gemma 3n 4B · Non-thinking	LiveCodeBench · v5	25.7	4 / 5	In Quality Score
Google Gemma 4 31B IT · Thinking	LiveCodeBench	80	5 / 69	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	LiveCodeBench · v5	18.6	5 / 5	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	LiveCodeBench	77.1	6 / 69	In Quality Score

Show all benchmark evidence (123 rows)

Reasoning

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 3 27B IT · Non-thinking	MMLU Pro · 5_shot_cot	67.5	1 / 4	In Quality Score
Google Gemma 4 31B IT · Thinking	Humanity's Last Exam · search	26.5	1 / 2	In Quality Score
Google Gemma 3 27B IT · Non-thinking	GPQA Diamond · 5_shot_cot	42.4	2 / 4	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	Humanity's Last Exam · search	17.2	2 / 2	In Quality Score
Google Gemma 4 31B IT · Thinking	MMLU Pro	85.2	20 / 86	In Quality Score
Google Gemma 4 31B IT · Thinking	Arena Elo	1452	34 / 158	In Quality Score
Google Gemma 4 31B IT · Thinking	GPQA Diamond	84.3	37 / 143	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	MMLU Pro	82.6	38 / 86	In Quality Score
Google Gemma 4 31B IT · Thinking	Humanity's Last Exam · hle	19.5	42 / 90	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	Arena Elo	1439	47 / 158	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	GPQA Diamond	82.3	47 / 143	In Quality Score
Google Gemma 4 31B IT · Thinking	LiveBench	61.6	56 / 110	In Quality Score
Google Gemma 3 27B IT · Non-thinking	AIME 2025	24	62 / 88	In Quality Score
Google Gemma 4 E4B IT · Thinking	MMLU Pro	69.4	65 / 86	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	Humanity's Last Exam · hle	8.7	67 / 90	In Quality Score
Google Gemma 3 27B IT · Non-thinking	MMLU Pro	67.5	68 / 86	In Quality Score
Google Gemma 3 12B IT · Non-thinking	AIME 2025	18.8	70 / 88	In Quality Score
Google Gemma 3 12B IT · Non-thinking	MMLU Pro	60.6	74 / 86	In Quality Score
Google Gemma 4 E2B IT · Thinking	MMLU Pro	60	75 / 86	In Quality Score
Google Gemma 3n 4B · Non-thinking	AIME 2025	11.6	78 / 88	In Quality Score
Google Gemma 3n 4B · Non-thinking	MMLU Pro	50.6	80 / 86	In Quality Score
Google Gemma 3 4B IT · Non-thinking	MMLU Pro	43.6	81 / 86	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	MMLU Pro	40.5	83 / 86	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	AIME 2025	6.7	83 / 88	In Quality Score
Google Gemma 3 27B IT · Non-thinking	LiveBench	49.2	86 / 110	In Quality Score
Google Gemma 3 1B IT · Non-thinking	MMLU Pro	14.7	86 / 86	In Quality Score
Google Gemma 3 1B IT · Non-thinking	AIME 2025	0.8	88 / 88	In Quality Score
Google Gemma 3 12B IT · Non-thinking	LiveBench	43.7	94 / 110	In Quality Score
Google Gemma 4 E4B IT · Thinking	GPQA Diamond	58.6	103 / 143	In Quality Score
Google Gemma 3 1B IT · Non-thinking	LiveBench	14.4	110 / 110	In Quality Score
Google Gemma 3 27B IT · Non-thinking	Arena Elo	1366	113 / 158	In Quality Score
Google Gemma 4 E2B IT · Thinking	GPQA Diamond	43.4	124 / 143	In Quality Score
Google Gemma 3 27B IT · Non-thinking	GPQA Diamond	42.4	125 / 143	In Quality Score
Google Gemma 3 12B IT · Non-thinking	Arena Elo	1342	126 / 158	In Quality Score
Google Gemma 3 12B IT · Non-thinking	GPQA Diamond	40.9	127 / 143	In Quality Score
Google Gemma 3 4B IT · Non-thinking	GPQA Diamond	30.8	135 / 143	In Quality Score
Google Gemma 3n 4B · Non-thinking	Arena Elo	1318	137 / 158	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	GPQA Diamond	24.8	140 / 143	In Quality Score
Google Gemma 3n 4B · Non-thinking	GPQA Diamond	23.7	141 / 143	In Quality Score
Google Gemma 3 1B IT · Non-thinking	GPQA Diamond	19.2	143 / 143	In Quality Score
Google Gemma 3 4B IT · Non-thinking	Arena Elo	1303	144 / 158	In Quality Score
Google Gemma 3 27B IT · Non-thinking	GSM8K	95.9	1 / 10	Tracked evidence
Google Gemma 4 31B IT · Thinking	AIME 2026 · no_tools	89.2	1 / 4	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	GSM8K	94.4	2 / 10	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	AIME 2026 · no_tools	88.3	2 / 4	Tracked evidence
Google Gemma 4 E4B IT · Thinking	AIME 2026 · no_tools	42.5	3 / 4	Tracked evidence
Google Gemma 4 E2B IT · Thinking	AIME 2026 · no_tools	37.5	4 / 4	Tracked evidence
Google Gemma 3 4B IT · Non-thinking	GSM8K	89.2	7 / 10	Tracked evidence
Google Gemma 4 31B IT · Thinking	MMMLU	88.4	9 / 38	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	Multi-IF	69.8	9 / 32	Tracked evidence
Google Gemma 4 31B IT · Thinking	MRCR · v2_128k	66.4	9 / 23	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	GSM8K	62.8	10 / 10	Tracked evidence
Google Gemma 4 31B IT · Thinking	MMMU PRO	76.9	12 / 52	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	Arena-Hard	86.8	13 / 40	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	MMMLU	86.3	14 / 38	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	Multi-IF	65.6	14 / 32	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	Arena-Hard	82.6	19 / 40	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	MRCR · v2_128k	44.1	19 / 23	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MMMU · mmmu_single	50	21 / 22	Tracked evidence
Google Gemma 4 E4B IT · Thinking	MRCR · v2_128k	25.4	22 / 23	Tracked evidence
Google Gemma 4 E2B IT · Thinking	MRCR · v2_128k	19.1	23 / 23	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	MMMU PRO	73.8	26 / 52	Tracked evidence
Google Gemma 4 E4B IT · Thinking	MMMLU	76.6	27 / 38	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	MMLU	76.9	29 / 33	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	MATH 500	90	30 / 55	Tracked evidence
Google Gemma 4 E2B IT · Thinking	MMMLU	67.4	31 / 38	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	BFCL v3	59.1	31 / 49	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	Multi-IF	32.8	31 / 32	Tracked evidence
Google Gemma 3n 4B · Non-thinking	MMLU	64.9	32 / 33	Tracked evidence
Google Gemma 3n 2B IT · Non-thinking	MMLU	60.1	33 / 33	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	SimpleQA	10	35 / 40	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MATH 500	85.6	36 / 55	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	SimpleQA	6.3	37 / 40	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	Arena-Hard	17.8	38 / 40	Tracked evidence
Google Gemma 3 4B IT · Non-thinking	SimpleQA	4	39 / 40	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	SimpleQA	2.2	40 / 40	Tracked evidence
Google Gemma 4 E4B IT · Thinking	MMMU PRO	52.6	41 / 52	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	BFCL v3	50.6	41 / 49	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	MMMU PRO	48.4	45 / 52	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	AIME 2024	32.6	46 / 69	Tracked evidence
Google Gemma 4 E2B IT · Thinking	MMMU PRO	44.2	48 / 52	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	BFCL v3	16.3	49 / 49	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	MATH 500	46.4	55 / 55	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	AIME 2024	22.4	55 / 69	Tracked evidence
Google Gemma 3 1B IT · Non-thinking	AIME 2024	0.9	69 / 69	Tracked evidence

Coding

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 3n 4B · Non-thinking	LiveCodeBench · v5	25.7	4 / 5	In Quality Score
Google Gemma 4 31B IT · Thinking	LiveCodeBench	80	5 / 69	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	LiveCodeBench · v5	18.6	5 / 5	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	LiveCodeBench	77.1	6 / 69	In Quality Score
Google Gemma 3 27B IT · Non-thinking	LiveCodeBench · 2024_10_01_to_2025_02_01	29.7	7 / 9	In Quality Score
Google Gemma 4 E4B IT · Thinking	LiveCodeBench	52	33 / 69	In Quality Score
Google Gemma 4 E2B IT · Thinking	LiveCodeBench	44	36 / 69	In Quality Score
Google Gemma 3 27B IT · Non-thinking	Aider (Polyglot)	4.9	44 / 45	In Quality Score
Google Gemma 3 27B IT · Non-thinking	LiveCodeBench	29.7	48 / 69	In Quality Score
Google Gemma 3 12B IT · Non-thinking	LiveCodeBench	24.6	55 / 69	In Quality Score
Google Gemma 3n 2B IT · Non-thinking	LiveCodeBench	13.2	60 / 69	In Quality Score
Google Gemma 3n 4B · Non-thinking	LiveCodeBench	13.2	61 / 69	In Quality Score
Google Gemma 3 4B IT · Non-thinking	LiveCodeBench	12.6	62 / 69	In Quality Score
Google Gemma 3 1B IT · Non-thinking	LiveCodeBench	1.9	69 / 69	In Quality Score
Google Gemma 4 31B IT · Thinking	Codeforces	2150	5 / 47	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	Codeforces	1718	20 / 47	Tracked evidence
Google Gemma 3 27B IT · Non-thinking	Codeforces	1063	33 / 47	Tracked evidence
Google Gemma 4 E4B IT · Thinking	Codeforces	940	36 / 47	Tracked evidence
Google Gemma 4 E2B IT · Thinking	Codeforces	633	44 / 47	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	Codeforces	462	46 / 47	Tracked evidence

Agentic

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 4 31B IT · Thinking	τ²-bench · average	76.9	20 / 30	In Quality Score
Google Gemma 4 26B A4B IT · Thinking	τ²-bench · average	68.2	22 / 30	In Quality Score
Google Gemma 4 E4B IT · Thinking	τ²-bench · average	42.2	27 / 30	In Quality Score
Google Gemma 4 E2B IT · Thinking	τ²-bench · average	24.5	29 / 30	In Quality Score

Multimodal

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 3 27B IT · Non-thinking	ChartQA	76.3	8 / 9	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MathVision · mini	31.9	8 / 10	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	ChartQA · test	39	9 / 10	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MathVerse · mini	29.8	10 / 10	Tracked evidence
Google Gemma 4 31B IT · Thinking	MedXpertQA · mm	61.3	14 / 31	Tracked evidence
Google Gemma 4 26B A4B IT · Thinking	MedXpertQA · mm	58.1	15 / 31	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	HallusionBench	65.3	16 / 33	Tracked evidence
Google Gemma 4 E4B IT · Thinking	MedXpertQA · mm	28.7	23 / 31	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	AI2D · test	80.4	28 / 33	Tracked evidence
Google Gemma 4 E2B IT · Thinking	MedXpertQA · mm	23.5	28 / 31	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MMStar	59.4	30 / 33	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	MathVista · mini	57.4	35 / 36	Tracked evidence

Document/OCR

Model / Variant	Benchmark	Score	Rank	Scoring
Google Gemma 3 27B IT · Non-thinking	DocVQA	90.4	6 / 8	Tracked evidence
Google Gemma 3 12B IT · Non-thinking	OCRBench	75.3	29 / 35	Tracked evidence

Where this family sits in the market

Gemma's e2b and e4b efficiency variants take the cost-efficiency frontier for small-footprint self-hosting. The 26B-A4B MoE sits at the family's quality-per-active-param sweet spot.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Self-hosting

These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.

Google Gemma 4 31B ITThinking · open weights
Google Gemma 4 26B A4B ITThinking · open weights
Google Gemma 4 E4B ITThinking · open weights
Google Gemma 4 E2B ITThinking · open weights
Google Gemma 3 27B ITNon-thinking · open weights
Google Gemma 3 12B ITNon-thinking · open weights
Google Gemma 3 4B ITNon-thinking · open weights
Google Gemma 3 1B ITNon-thinking · open weights
Google Gemma 3n 4BNon-thinking · open weights
Google Gemma 3n 2B ITNon-thinking · open weights

The Gemma family

Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.

Open weights (10)

Google Gemma 4 31B IT1 variant
Google Gemma 4 26B A4B IT1 variant
Google Gemma 4 E4B IT1 variant
Google Gemma 4 E2B IT1 variant
Google Gemma 3 27B IT1 variant
Google Gemma 3 12B IT1 variant
Google Gemma 3 4B IT1 variant
Google Gemma 3 1B IT1 variant
Google Gemma 3n 4B1 variant
Google Gemma 3n 2B IT1 variant

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Llama: Muse Spark (Thinking), Llama 4 and 3 Compared
Llama: Muse Spark (Thinking) ranks #12 of 186 on Quality Score. Compare Llama 4, Llama 3, and Muse Spark by self-hosting and workload.
Qwen3: Qwen 3.7 Max Preview, Qwen3.5, Qwen3.6 Compared
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.

Caveats

What this page does not tell you, listed honestly.

No tracked API pricing for: Google Gemma 4 E4B IT, Google Gemma 4 E2B IT, Google Gemma 3 1B IT, Google Gemma 3n 2B IT. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
Context window not declared for: Google Gemma 4 E4B IT, Google Gemma 4 E2B IT, Google Gemma 3 1B IT, Google Gemma 3n 2B IT.
Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified 2026-05-09AI-assisted, human-reviewed

Why this family matters

Gemma is Google's open-weights line, distinct from the closed Gemini API family. The current generation (Gemma 4) brings a meaningful architectural change: alongside the dense 31B variant, the family now ships a 26B-A4B mixture-of-experts build (26B total parameters, ~4B active per token), plus efficiency-tuned e2b and e4b variants sized for edge and on-device deployment.

The structural pull onto Gemma is self-host fit. The 26B-A4B MoE costs and serves like a small model on capable hardware while landing at Quality Score 83.9 (#48 of 186 models we track), which is competitive with dense models 3 to 5 times its active-parameter footprint. The dense 31B at QS 88.6 is the family's quality ceiling; the e2b and e4b variants are the cost-efficiency frontier for constrained hardware.

Which variant to start with

Default to google-gemma-4-26b-a4b-it. At $0.06 input / $0.33 output per million on hosted routes (or proportional self-host cost) and Quality Score 83.9 with a 262K context window, it is the family's quality-per-active-param sweet spot. Pick this unless you have a specific reason not to.

When to deviate:

Maximum quality within the family: use google-gemma-4-31b-it (QS 88.6, dense 31B). The score gap to 26B-A4B is modest (3 points of Quality Score), but for deployments that benefit from dense inference characteristics or where MoE serving is awkward, the dense variant is the cleaner option.
Self-host on a single GPU: the choice depends on what you are optimising for. The 26B-A4B MoE has the active-param footprint of a 4B model but the full weights still need to fit in memory; plan for the full 26B parameter count when sizing GPU memory, not the active subset. The dense 31B is heavier on memory but simpler to serve. Below those, the e4b variant is the realistic single-GPU consumer-hardware target in the family.
Edge or on-device: drop to google-gemma-4-e2b-it (QS 62.5, efficiency-tuned). Smallest Gemma 4 variant; fits CPU and small-footprint inference where round-trip latency rules out hosted APIs. Treat its score as directional, not comparative against the larger variants.
You are migrating an existing Gemma 3 deployment: the Gemma 3 line is still in our index, but the 27B at QS 58.9 (and the smaller variants below it) lag the Gemma 4 generation by enough that the migration cost is almost certainly earned by the score uplift, particularly if the deployment is recent enough that fine-tunes or pinned weights are not the constraint.

Where the data is weak

We aggregate benchmark scores from multiple sources but coverage and naming across this family deserve a careful read. Specifically:

Gemma 3 vs Gemma 4 is a generation gap, not a minor version bump. Gemma 4 31B at QS 88.6 vs Gemma 3 27B at QS 58.9 is a difference of nearly 30 Quality Score points on otherwise comparable parameter counts. Do not collapse "Gemma 27B" and "Gemma 31B" into one mental category; they are different generations.
e2b and e4b efficiency variants have thinner benchmark coverage than the dense and MoE variants. Several benchmarks (LiveBench, AIME, SWE-Bench) carry numbers for 31B / 26B-A4B but not for the efficiency variants at last verification. Treat their listed scores as directional, particularly outside Quality Score and Arena ELO.
Context windows on the e-variants are not declared in our index. Gemma 4 31B and 26B-A4B both ship at 262K; the e4b and e2b context fields are unset. That is a coverage gap, not evidence of a smaller window.
Hosted-vs-self-host pricing. The pricing on this page is what hosted inference providers charge for Gemma, not the cost of self-hosting (which depends on your own hardware and utilisation). For an open-weights family, list price is the calibration anchor for cross-family comparison; the deployment cost question is on you.
The Gemma 3n line. The 3n variants (e2b, e4b) appear alongside Gemma 3 in our index but are an efficiency-tuned derivative, not a drop-in for Gemma 3. Their scores are weaker than the larger Gemma 3 variants and substantially weaker than Gemma 4 e-variants; they exist for niche on-device use cases, not general workhorse deployment.

If you are making a procurement decision (or, more often for this family, a deployment-architecture decision), the variant table on this page is the load-bearing artifact. Cross-check the open-weights license terms against your use case before you commit; Gemma's licence is permissive but specific.

When to reach for which alternative

Open-weights breadth across model sizes: the Qwen3 family ships dense models from 0.6B to 32B plus MoE variants, which gives a wider spread than Gemma's current lineup, particularly at the smaller end. Pick by which family's smallest deployable variant fits your hardware budget and licence requirements.
Cheapest competent open-weights API: DeepSeek V4 Flash ($0.098 / $0.197 with QS 78.1 and 1M context) is the price anchor to beat at the chat workhorse tier. Gemma 4 26B-A4B competes on the open-weights story; DeepSeek wins on hosted-API quality-per-dollar at the chat tier.
You need a closed Google option for the same workload: the Gemini 3 surface in our index covers Google's current closed line. Gemma is not a drop-in for Gemini in either direction; the licence and deployment story differ enough that the surfaces are best evaluated separately, not as a tier-ladder.

Sources worth reading

Gemma model docs: authoritative variant identifiers, license terms, and intended use
Gemma on Hugging Face: model cards, weights, and per-variant license information
Google Gemma announcements: release notes for new generations

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Gemma update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →