xAI family

Grok

Grok: 4.20 Beta1 (Reasoning) ranks #10 of 186 with 1.0M-token context and $1.25/$2.5 per 1M tokens. Compare Grok 4.3, 4.20, and legacy Fast tiers.

Top in this family

Grok 4.20 Beta1 (Reasoning) ranks #10 of 186 on overall quality (QS 100.7) at $1.25/$2.5 per 1M tokens.

Variants
5
License
Closed weights
Provider
xAI

★ Most teams should start here

xAI Grok 4

Variant: Grok 4.3

The current default. xAI's May 2026 retirement guide points general and coding migrations to Grok 4.3; use 4.20 multi-agent for the 2M context route or 4.20 non-reasoning only when that migration path is the real constraint.

Quality Score
85.4
Input
$1.25/1M
Output
$2.50/1M
Context
1.0M
License
Closed · API

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
Coding agents
xAI Grok 4
Grok 4.3
$1.25/1M / $2.50/1M
Use Grok 4.3. xAI's retirement guide names it as the replacement for Grok Code Fast, and it is the active route for agentic coding and web development work.
General API workhorse
xAI Grok 4
Grok 4.3
$1.25/1M / $2.50/1M
Start with Grok 4.3 for general chat, summarization, and tool use. It is the current API default; older Fast rows are migration context, not the recommendation.
Long-context RAG
xAI Grok 4
Grok 4.3
$1.25/1M / $2.50/1M
Use 4.20 multi-agent when the 2M-token window is the binding requirement. Use 4.3 when current-generation quality matters more than maximum context, and use 4.20 non-reasoning only for that explicit migration path.

All variants

31 variants across 5 models (+ 3 cross-family for context). Sorted by quality score (descending).

VariantQSGPQAHLESWESWE-ProTerminalTauMCPAIMEIn $/MOut $/MContextReleasedLic.
4.20 Beta1 (Reasoning)
Grok 4
100.7
#10/186
$1.25$2.51.0MJul 9, 2025
4.20 Beta1
Grok 4
93.8
#21/186
$1.25$2.51.0MJul 9, 2025
Grok 4.2
Grok 4
90.6
#28/186
88.531.676.751.8$1.25$2.5256KJul 9, 2025
Grok 4
Grok 4
85.9
#42/186
87.525.423.176.591.7$1.25$2.5256KJul 9, 2025
Grok 4.3
Grok 4
85.4
#44/186
$1.25$2.51.0MJul 9, 2025
4.1
Grok 4
$1.25$2.5256KJul 9, 2025
4.1 Thinking
Grok 4
$1.25$2.5256KJul 9, 2025
4.20 Beta1 (Non-Thinking)
Grok 4
$1.25$2.51.0MJul 9, 2025
4.20 Multi-Agent
Grok 4
$1.25$2.52.0MJul 9, 2025
ThinkingPrevious
Grok 4 Fast
Newer: xAI Grok 4
84.3
#47/186
84.317.6Sep 19, 2025
Non-ThinkingPrevious
Grok 4 Fast
Newer: xAI Grok 4
77.0
#82/186
84.317.650.692.0Sep 19, 2025
ThinkingPrevious
Grok 3
Newer: xAI Grok 4
76.3
#85/186
80.277.3Feb 17, 2025
Grok 3 MiniPrevious
Grok 3 Mini
Newer: xAI Grok 4
74.1
#97/186
79.011.083.0Feb 17, 2025
Non-thinkingPrevious
Grok Code Fast
Newer: xAI Grok 4
53.0
#173/186
14.2Aug 29, 2025
4.8 Thinkingcross-family
Anthropic Claude Opus 4
108.6
#2/186
93.649.888.669.282.2$5$25200KMay 22, 2025
4.7 Thinkingcross-family
Anthropic Claude Opus 4
107.8
#3/186
94.246.987.664.369.477.3$5$25200KMay 22, 2025
3.1cross-family
Gemini 3 Pro
104.3
#5/186
94.344.480.654.268.590.873.9$2$12Nov 18, 2025
4.6 Thinkingcross-family
Anthropic Claude Opus 4
104.1
#6/186
91.340.080.853.465.491.959.595.6$5$251.0MMay 22, 2025
4.5 Thinkingcross-family
Anthropic Claude Opus 4
98.6
#13/186
87.030.880.959.388.962.392.8$5$25200KMay 22, 2025
V4 Pro Thinkingcross-family
DeepSeek V4
98.0
#15/186
90.137.780.655.473.6$0.435$0.871.0MApr 24, 2026
3.0cross-family
Gemini 3 Pro
95.0
#20/186
91.937.576.243.354.285.354.195.0$2$12Nov 18, 2025
4.6 Non-thinkingcross-family
Anthropic Claude Opus 4
93.1
#23/186
19.0$5$25200KMay 22, 2025
V4 Flash Thinkingcross-family
DeepSeek V4
92.0
#27/186
88.134.879.052.669.0$0.098$0.1971.0MApr 24, 2026
4.1 Thinkingcross-family
Anthropic Claude Opus 4
83.1
#50/186
81.011.774.538.086.840.978.0$15$75200KMay 22, 2025
V4 Procross-family
DeepSeek V4
80.9
#61/186
72.97.773.652.169.4$0.435$0.871.0MApr 24, 2026
4.5 Non-thinkingcross-family
Anthropic Claude Opus 4
80.7
#63/186
14.245.9$5$25200KMay 22, 2025
4.0 Thinkingcross-family
Anthropic Claude Opus 4
80.7
#64/186
79.610.772.581.475.5$15$75200KMay 22, 2025
4.0 Non-thinkingcross-family
Anthropic Claude Opus 4
79.1
#73/186
74.96.772.581.833.9$15$75200KMay 22, 2025
V4 Flashcross-family
DeepSeek V4
78.1
#78/186
71.28.173.749.164.0$0.098$0.1971.0MApr 24, 2026
4.1 Non-thinkingcross-family
Anthropic Claude Opus 4
70.4
#115/186
7.9$15$75200KMay 22, 2025
4.7 Non-thinkingcross-family
Anthropic Claude Opus 4
$5$25200KMay 22, 2025

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (55 of 97 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
xAI Grok 4 · Grok 4LiveCodeBench · 2024_07_2025_0181.91 / 8In Quality Score
Grok 4 Fast · Non-ThinkingLiveCodeBench · 2025_01_2025_05_single802 / 11In Quality Score
xAI Grok 3 · ThinkingLiveCodeBench · 2024_single70.62 / 2In Quality Score
xAI Grok 4 · Grok 4LiveCodeBench · 2025_01_2025_05_single793 / 11In Quality Score
xAI Grok 4 · Grok 4AIME 2025 · aime_2025_python98.84 / 7In Quality Score
xAI Grok 4 · Grok 4.2LiveCodeBench · pro74.24 / 5In Quality Score
xAI Grok 4 · Grok 4Aider (Polyglot)79.65 / 45In Quality Score
Grok 4 Fast · Non-ThinkingAIME 2025 · no_tools91.96 / 15In Quality Score
Show all benchmark evidence (97 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
xAI Grok 4 · Grok 4AIME 2025 · aime_2025_python98.84 / 7In Quality Score
Grok 4 Fast · Non-ThinkingAIME 2025 · no_tools91.96 / 15In Quality Score
Grok 4 Fast · Non-ThinkingAIME 2025929 / 88In Quality Score
xAI Grok 4 · Grok 4AIME 202591.711 / 88In Quality Score
xAI Grok 4 · 4.20 Beta1Arena Elo147612 / 158In Quality Score
xAI Grok 4 · Grok 4Humanity's Last Exam · hle_text25.412 / 56In Quality Score
xAI Grok 4 · Grok 4MMLU Pro86.613 / 86In Quality Score
xAI Grok 4 · Grok 4SimpleBench60.513 / 61In Quality Score
xAI Grok 4 · 4.20 Beta1 (Reasoning)Arena Elo147514 / 158In Quality Score
xAI Grok 4 · Grok 4.2Humanity's Last Exam · hle31.616 / 90In Quality Score
xAI Grok 4 · Grok 4.2GPQA Diamond88.517 / 143In Quality Score
Grok 4 Fast · Non-ThinkingSimpleBench5617 / 61In Quality Score
xAI Grok 4 · 4.20 Multi-AgentArena Elo147218 / 158In Quality Score
xAI Grok 4 · 4.1 ThinkingArena Elo146623 / 158In Quality Score
xAI Grok 4 · Grok 4GPQA Diamond87.523 / 143In Quality Score
xAI Grok 4 · 4.1Arena Elo146026 / 158In Quality Score
Grok 3 Mini · Grok 3 MiniAIME 20258328 / 88In Quality Score
xAI Grok 4 · Grok 4Humanity's Last Exam · hle25.428 / 90In Quality Score
xAI Grok 4 · Grok 4Humanity's Last Exam · tools4129 / 38In Quality Score
xAI Grok 3 · ThinkingAIME 202577.334 / 88In Quality Score
Grok 4 Fast · Non-ThinkingGPQA Diamond84.338 / 143In Quality Score
Grok 4 Fast · ThinkingGPQA Diamond84.339 / 143In Quality Score
xAI Grok 3 · ThinkingSimpleBench36.139 / 61In Quality Score
xAI Grok 4 · Grok 4.3Arena Elo144740 / 158In Quality Score
xAI Grok 4 · 4.20 Beta1LiveBench68.042 / 110In Quality Score
xAI Grok 4 · Grok 4.3LiveBench66.748 / 110In Quality Score
Grok 4 Fast · Non-ThinkingHumanity's Last Exam · hle17.648 / 90In Quality Score
Grok 4 Fast · ThinkingHumanity's Last Exam · hle17.649 / 90In Quality Score
xAI Grok 3 · ThinkingGPQA Diamond80.253 / 143In Quality Score
xAI Grok 4 · Grok 4LiveBench62.053 / 110In Quality Score
Grok 4 Fast · ThinkingArena Elo143155 / 158In Quality Score
Grok 3 Mini · Grok 3 MiniGPQA Diamond7957 / 143In Quality Score
Grok 3 Mini · Grok 3 MiniHumanity's Last Exam · hle1161 / 90In Quality Score
Grok 4 Fast · Non-ThinkingArena Elo142165 / 158In Quality Score
Grok 4 Fast · ThinkingLiveBench60.065 / 110In Quality Score
xAI Grok 3 · ThinkingArena Elo141278 / 158In Quality Score
xAI Grok 4 · Grok 4Arena Elo141083 / 158In Quality Score
Grok Code Fast · Non-thinkingLiveBench45.193 / 110In Quality Score
xAI Grok 4 · 4.20 Beta1 (Non-Thinking)LiveBench39.7100 / 110In Quality Score
Grok 4 Fast · Non-ThinkingLiveBench33.5103 / 110In Quality Score
xAI Grok 4 · Grok 4MATH 500991 / 55Tracked evidence
xAI Grok 4 · Grok 4AIME 202494.31 / 69Tracked evidence
xAI Grok 4 · Grok 4HMMT Feb 2025 · python93.93 / 6Tracked evidence
xAI Grok 4 · Grok 4.2HealthBench · hard20.34 / 5Tracked evidence
xAI Grok 3 · ThinkingMRCR · v2_average345 / 6Tracked evidence
xAI Grok 3 · ThinkingSimpleQA43.67 / 40Tracked evidence
Grok 4 Fast · Non-ThinkingFACTS Benchmark Suite42.17 / 12Tracked evidence
Grok 4 Fast · Non-ThinkingHMMT Feb 202593.38 / 44Tracked evidence
xAI Grok 3 · ThinkingMMMU · mmmu_single768 / 22Tracked evidence
Grok 4 Fast · ThinkingFACTS Benchmark Suite42.18 / 12Tracked evidence
xAI Grok 4 · Grok 4SciCode45.79 / 24Tracked evidence
Grok 4 Fast · Non-ThinkingMMMLU86.810 / 38Tracked evidence
Grok 4 Fast · ThinkingMMMLU86.811 / 38Tracked evidence
xAI Grok 3 · ThinkingAIME 202483.912 / 69Tracked evidence
Grok 4 Fast · Non-ThinkingMRCR · v2_1m6.112 / 14Tracked evidence
Grok 4 Fast · ThinkingMRCR · v2_1m6.113 / 14Tracked evidence
Grok 4 Fast · Non-ThinkingMRCR · v2_128k54.614 / 23Tracked evidence
xAI Grok 4 · Grok 4HMMT Feb 20259015 / 44Tracked evidence
Grok 4 Fast · ThinkingMRCR · v2_128k54.615 / 23Tracked evidence
Grok 4 Fast · Non-ThinkingBrowseComp_zh51.215 / 20Tracked evidence
Grok 4 Fast · Non-ThinkingGlobal PIQA85.616 / 26Tracked evidence
xAI Grok 4 · Grok 4BFCL v366.219 / 49Tracked evidence
xAI Grok 4 · Grok 4.2MMMU PRO75.220 / 52Tracked evidence
xAI Grok 4 · Grok 4IMO AnswerBench73.123 / 28Tracked evidence
Grok 3 Mini · Grok 3 MiniHMMT Feb 20257425 / 44Tracked evidence
Grok 4 Fast · Non-ThinkingSimpleQA19.525 / 40Tracked evidence
Grok 4 Fast · ThinkingSimpleQA19.526 / 40Tracked evidence
Grok 4 Fast · Non-ThinkingBrowseComp44.929 / 51Tracked evidence
xAI Grok 4 · Grok 4BrowseComp32.635 / 51Tracked evidence
Grok 4 Fast · Non-ThinkingMMMU PRO6336 / 52Tracked evidence
Grok 4 Fast · ThinkingMMMU PRO6337 / 52Tracked evidence

Coding

Model / VariantBenchmarkScoreRankScoring
xAI Grok 4 · Grok 4LiveCodeBench · 2024_07_2025_0181.91 / 8In Quality Score
Grok 4 Fast · Non-ThinkingLiveCodeBench · 2025_01_2025_05_single802 / 11In Quality Score
xAI Grok 3 · ThinkingLiveCodeBench · 2024_single70.62 / 2In Quality Score
xAI Grok 4 · Grok 4LiveCodeBench · 2025_01_2025_05_single793 / 11In Quality Score
xAI Grok 4 · Grok 4.2LiveCodeBench · pro74.24 / 5In Quality Score
xAI Grok 4 · Grok 4Aider (Polyglot)79.65 / 45In Quality Score
Grok 4 Fast · ThinkingLiveCodeBench76.57 / 69In Quality Score
Grok 3 Mini · Grok 3 MiniLiveCodeBench · 2025_01_2025_05_single708 / 11In Quality Score
xAI Grok 3 · ThinkingLiveCodeBench70.612 / 69In Quality Score
xAI Grok 4 · Grok 4.2SWE-bench Verified76.721 / 68In Quality Score
xAI Grok 3 · ThinkingAider (Polyglot)53.324 / 45In Quality Score
Grok 4 Fast · Non-ThinkingSWE-bench Verified50.660 / 68In Quality Score

Agentic

Model / VariantBenchmarkScoreRankScoring
xAI Grok 4 · Grok 4.2τ²-bench · telecom96.510 / 28In Quality Score
xAI Grok 4 · Grok 4τ²-bench · airline58.414 / 29In Quality Score
xAI Grok 4 · Grok 4τ²-bench · retail76.517 / 34In Quality Score
Grok 4 Fast · Non-ThinkingVendingBench · v211075 / 7Tracked evidence
xAI Grok 4 · Grok 4.2DeepSearchQA62.87 / 7Tracked evidence
xAI Grok 4 · Grok 4.2GDPVal-AA105517 / 17Tracked evidence

Multimodal

Model / VariantBenchmarkScoreRankScoring
xAI Grok 4 · Grok 4.2MedXpertQA · text50.25 / 5Tracked evidence
xAI Grok 4 · Grok 4.2MedXpertQA · mm65.88 / 31Tracked evidence
xAI Grok 4 · Grok 4.2ZEROBench910 / 27Tracked evidence
xAI Grok 4 · Grok 4.2ERQA54.112 / 27Tracked evidence
xAI Grok 4 · Grok 4.2SimpleVQA57.414 / 29Tracked evidence
Grok 4 Fast · ThinkingVideo-MMMU74.621 / 28Tracked evidence
xAI Grok 4 · Grok 4.2CharXiv Reasoning60.936 / 48Tracked evidence
Grok 4 Fast · ThinkingCharXiv Reasoning31.648 / 48Tracked evidence

Where this family sits in the market

Grok 4.20 reasoning is the family's benchmark outlier and 4.20 multi-agent is the context outlier, while Grok 4.3 is the current general and coding default. Fast, Code Fast, and Grok 3 are legacy comparison rows.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

The Grok family

Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.

Closed · API only (5)

  • xAI Grok 49 variants
  • Grok 4 Fast2 variants
  • xAI Grok 31 variant
  • Grok 3 Mini1 variant
  • Grok Code Fast1 variant

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Caveats

What this page does not tell you, listed honestly.

  • No tracked API pricing for: Grok 4 Fast, xAI Grok 3, Grok 3 Mini, Grok Code Fast. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
  • Context window not declared for: Grok 4 Fast, xAI Grok 3, Grok 3 Mini, Grok Code Fast.
  • Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

Why this family changed

xAI's May 2026 retirement moved the Grok decision surface. Grok 4 Fast, Grok Code Fast, and Grok 3 are no longer the active recommendations; they remain in our table because historical benchmarks and pinned deployments still need a place to land.

The current decision is mostly Grok 4.3 vs Grok 4.20. Grok 4.3 is the default route for general and coding workloads at $1.25 input /

$2.5 output per million with a 1M window. Grok 4.20 splits into two jobs: reasoning/non-reasoning routes with a 1M window, and the multi-agent route with a

2M

window at $1.25 / $2.5 per million.

Which route to start with

Default to x-ai-grok-4 / 4.3 for chat, summarization, coding agents, and tool-augmented assistants. xAI names Grok 4.3 as the replacement for Grok Code Fast and the old reasoning Fast route, and its pricing is now the same headline rate as 4.20 in our index.

Use x-ai-grok-4 / 4.20-multi-agent when the 2M-token context window is the binding requirement. Use 4.20-beta1-non-thinking when you explicitly need the non-reasoning migration path. These are the routes to test for large documents, long agent traces, or RAG systems where context size changes the architecture.

When to deviate:

  • You are still pinned to Grok 4 Fast or Code Fast: treat those rows as migration context, not fresh recommendations. xAI's retirement guide points reasoning and coding workloads to 4.3, and non-reasoning Fast workloads to 4.20 non-reasoning.
  • You are on Grok 3 or Grok 3 Mini: the old rows stay visible for historical measurements, but the practical migration target is Grok 4.3 unless your contract or eval suite blocks the move.
  • Hardest-tier reasoning workloads: 4.20 reasoning is currently the strongest Grok row by composite Quality Score in our payload (100.7, #10 of 186 models we track). Run it against Claude Opus, Gemini Pro, and GPT-5 on your specific workload before treating the family rank as the answer.

Where the data is weak

We aggregate benchmark scores from multiple sources but coverage is uneven across this family. Specifically:

  • Grok 4 has many minor versions in our index. Scores for grok-4, 4.2, 4.3, and the 4.20 variants are not interchangeable. When this article quotes a number it is for the specific variant named.
  • 4.3 has thinner benchmark coverage than older Grok rows. It is the current API route, but some benchmark tables still make 4.20 or older rows look stronger because they have different coverage.
  • Fast and Code Fast are historical rows now. They remain useful for migration and benchmark provenance, but they should not be read as active buying advice after the May 2026 retirement.
  • Pricing on this page is the published xAI list price. Volume agreements and higher-context pricing can change the unit economics; cross-check xAI's docs before procurement.

If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against xAI's own docs before you commit.

When to reach for which alternative

  • Open-weights deployment is a requirement: Grok is API-only. The conversation moves to open-weights families (Qwen3, DeepSeek). On long-context cost as the binding axis, DeepSeek V4 Flash (1M context at $0.098 / $0.197 with QS 78.1) is the closest open-weights price anchor; Grok 4.20 multi-agent still wins on context size.
  • Closed-flagship reasoning at the absolute top: Claude Opus 4.7-thinking (QS 107.8), Gemini 3 Pro 3.1 (QS 104.3), and full GPT-5 are the anchors to compare against on the specific benchmark that matters. On any given benchmark the ranking can flip; the price gap between Grok 4 and these is small enough that the choice often comes down to ecosystem.
  • You are already paying for an OpenAI or Anthropic key: the case for adding Grok is workload-specific, not blanket. The strongest single reason is the 4.20 multi-agent context window or a measured win from 4.3 on your agentic workflow.

Sources worth reading

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Grok update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →