xAI family
Grok
Grok: 4.20 Beta1 (Reasoning) ranks #10 of 186 with 1.0M-token context and $1.25/$2.5 per 1M tokens. Compare Grok 4.3, 4.20, and legacy Fast tiers.
Top in this family
Grok 4.20 Beta1 (Reasoning) ranks #10 of 186 on overall quality (QS 100.7) at $1.25/$2.5 per 1M tokens.
- Variants
- 5
- License
- Closed weights
- Provider
- xAI
★ Most teams should start here
xAI Grok 4
Variant: Grok 4.3
The current default. xAI's May 2026 retirement guide points general and coding migrations to Grok 4.3; use 4.20 multi-agent for the 2M context route or 4.20 non-reasoning only when that migration path is the real constraint.
- Quality Score
- 85.4
- Input
- $1.25/1M
- Output
- $2.50/1M
- Context
- 1.0M
- License
- Closed · API
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| Coding agents | xAI Grok 4 Grok 4.3 $1.25/1M / $2.50/1M | Use Grok 4.3. xAI's retirement guide names it as the replacement for Grok Code Fast, and it is the active route for agentic coding and web development work. |
| General API workhorse | xAI Grok 4 Grok 4.3 $1.25/1M / $2.50/1M | Start with Grok 4.3 for general chat, summarization, and tool use. It is the current API default; older Fast rows are migration context, not the recommendation. |
| Long-context RAG | xAI Grok 4 Grok 4.3 $1.25/1M / $2.50/1M | Use 4.20 multi-agent when the 2M-token window is the binding requirement. Use 4.3 when current-generation quality matters more than maximum context, and use 4.20 non-reasoning only for that explicit migration path. |
All variants
31 variants across 5 models (+ 3 cross-family for context). Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | Terminal | Tau | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4.20 Beta1 (Reasoning) Grok 4 | 100.7 #10/186 | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 1.0M | Jul 9, 2025 | |
4.20 Beta1 Grok 4 | 93.8 #21/186 | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 1.0M | Jul 9, 2025 | |
Grok 4.2 Grok 4 | 90.6 #28/186 | 88.5 | 31.6 | 76.7 | 51.8 | — | — | — | — | $1.25 | $2.5 | 256K | Jul 9, 2025 | |
Grok 4 Grok 4 | 85.9 #42/186 | 87.5 | 25.4 | — | — | 23.1 | 76.5 | — | 91.7 | $1.25 | $2.5 | 256K | Jul 9, 2025 | |
Grok 4.3 Grok 4 | 85.4 #44/186 | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 1.0M | Jul 9, 2025 | |
4.1 Grok 4 | — | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 256K | Jul 9, 2025 | |
4.1 Thinking Grok 4 | — | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 256K | Jul 9, 2025 | |
4.20 Beta1 (Non-Thinking) Grok 4 | — | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 1.0M | Jul 9, 2025 | |
4.20 Multi-Agent Grok 4 | — | — | — | — | — | — | — | — | — | $1.25 | $2.5 | 2.0M | Jul 9, 2025 | |
ThinkingPrevious Grok 4 Fast Newer: xAI Grok 4 | 84.3 #47/186 | 84.3 | 17.6 | — | — | — | — | — | — | — | — | — | Sep 19, 2025 | |
Non-ThinkingPrevious Grok 4 Fast Newer: xAI Grok 4 | 77.0 #82/186 | 84.3 | 17.6 | 50.6 | — | — | — | — | 92.0 | — | — | — | Sep 19, 2025 | |
ThinkingPrevious Grok 3 Newer: xAI Grok 4 | 76.3 #85/186 | 80.2 | — | — | — | — | — | — | 77.3 | — | — | — | Feb 17, 2025 | |
Grok 3 MiniPrevious Grok 3 Mini Newer: xAI Grok 4 | 74.1 #97/186 | 79.0 | 11.0 | — | — | — | — | — | 83.0 | — | — | — | Feb 17, 2025 | |
Non-thinkingPrevious Grok Code Fast Newer: xAI Grok 4 | 53.0 #173/186 | — | — | — | — | 14.2 | — | — | — | — | — | — | Aug 29, 2025 | |
4.8 Thinkingcross-family Anthropic Claude Opus 4 | 108.6 #2/186 | 93.6 | 49.8 | 88.6 | 69.2 | — | — | 82.2 | — | $5 | $25 | 200K | May 22, 2025 | |
4.7 Thinkingcross-family Anthropic Claude Opus 4 | 107.8 #3/186 | 94.2 | 46.9 | 87.6 | 64.3 | 69.4 | — | 77.3 | — | $5 | $25 | 200K | May 22, 2025 | |
3.1cross-family Gemini 3 Pro | 104.3 #5/186 | 94.3 | 44.4 | 80.6 | 54.2 | 68.5 | 90.8 | 73.9 | — | $2 | $12 | — | Nov 18, 2025 | |
4.6 Thinkingcross-family Anthropic Claude Opus 4 | 104.1 #6/186 | 91.3 | 40.0 | 80.8 | 53.4 | 65.4 | 91.9 | 59.5 | 95.6 | $5 | $25 | 1.0M | May 22, 2025 | |
4.5 Thinkingcross-family Anthropic Claude Opus 4 | 98.6 #13/186 | 87.0 | 30.8 | 80.9 | — | 59.3 | 88.9 | 62.3 | 92.8 | $5 | $25 | 200K | May 22, 2025 | |
V4 Pro Thinkingcross-family DeepSeek V4 | 98.0 #15/186 | 90.1 | 37.7 | 80.6 | 55.4 | — | — | 73.6 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
3.0cross-family Gemini 3 Pro | 95.0 #20/186 | 91.9 | 37.5 | 76.2 | 43.3 | 54.2 | 85.3 | 54.1 | 95.0 | $2 | $12 | — | Nov 18, 2025 | |
4.6 Non-thinkingcross-family Anthropic Claude Opus 4 | 93.1 #23/186 | — | 19.0 | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
V4 Flash Thinkingcross-family DeepSeek V4 | 92.0 #27/186 | 88.1 | 34.8 | 79.0 | 52.6 | — | — | 69.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
4.1 Thinkingcross-family Anthropic Claude Opus 4 | 83.1 #50/186 | 81.0 | 11.7 | 74.5 | — | 38.0 | 86.8 | 40.9 | 78.0 | $15 | $75 | 200K | May 22, 2025 | |
V4 Procross-family DeepSeek V4 | 80.9 #61/186 | 72.9 | 7.7 | 73.6 | 52.1 | — | — | 69.4 | — | $0.435 | $0.87 | 1.0M | Apr 24, 2026 | |
4.5 Non-thinkingcross-family Anthropic Claude Opus 4 | 80.7 #63/186 | — | 14.2 | — | 45.9 | — | — | — | — | $5 | $25 | 200K | May 22, 2025 | |
4.0 Thinkingcross-family Anthropic Claude Opus 4 | 80.7 #64/186 | 79.6 | 10.7 | 72.5 | — | — | 81.4 | — | 75.5 | $15 | $75 | 200K | May 22, 2025 | |
4.0 Non-thinkingcross-family Anthropic Claude Opus 4 | 79.1 #73/186 | 74.9 | 6.7 | 72.5 | — | — | 81.8 | — | 33.9 | $15 | $75 | 200K | May 22, 2025 | |
V4 Flashcross-family DeepSeek V4 | 78.1 #78/186 | 71.2 | 8.1 | 73.7 | 49.1 | — | — | 64.0 | — | $0.098 | $0.197 | 1.0M | Apr 24, 2026 | |
4.1 Non-thinkingcross-family Anthropic Claude Opus 4 | 70.4 #115/186 | — | 7.9 | — | — | — | — | — | — | $15 | $75 | 200K | May 22, 2025 | |
4.7 Non-thinkingcross-family Anthropic Claude Opus 4 | — | — | — | — | — | — | — | — | — | $5 | $25 | 200K | May 22, 2025 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (55 of 97 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| xAI Grok 4 · Grok 4 | LiveCodeBench · 2024_07_2025_01 | 81.9 | 1 / 8 | In Quality Score |
| Grok 4 Fast · Non-Thinking | LiveCodeBench · 2025_01_2025_05_single | 80 | 2 / 11 | In Quality Score |
| xAI Grok 3 · Thinking | LiveCodeBench · 2024_single | 70.6 | 2 / 2 | In Quality Score |
| xAI Grok 4 · Grok 4 | LiveCodeBench · 2025_01_2025_05_single | 79 | 3 / 11 | In Quality Score |
| xAI Grok 4 · Grok 4 | AIME 2025 · aime_2025_python | 98.8 | 4 / 7 | In Quality Score |
| xAI Grok 4 · Grok 4.2 | LiveCodeBench · pro | 74.2 | 4 / 5 | In Quality Score |
| xAI Grok 4 · Grok 4 | Aider (Polyglot) | 79.6 | 5 / 45 | In Quality Score |
| Grok 4 Fast · Non-Thinking | AIME 2025 · no_tools | 91.9 | 6 / 15 | In Quality Score |
Show all benchmark evidence (97 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| xAI Grok 4 · Grok 4 | AIME 2025 · aime_2025_python | 98.8 | 4 / 7 | In Quality Score |
| Grok 4 Fast · Non-Thinking | AIME 2025 · no_tools | 91.9 | 6 / 15 | In Quality Score |
| Grok 4 Fast · Non-Thinking | AIME 2025 | 92 | 9 / 88 | In Quality Score |
| xAI Grok 4 · Grok 4 | AIME 2025 | 91.7 | 11 / 88 | In Quality Score |
| xAI Grok 4 · 4.20 Beta1 | Arena Elo | 1476 | 12 / 158 | In Quality Score |
| xAI Grok 4 · Grok 4 | Humanity's Last Exam · hle_text | 25.4 | 12 / 56 | In Quality Score |
| xAI Grok 4 · Grok 4 | MMLU Pro | 86.6 | 13 / 86 | In Quality Score |
| xAI Grok 4 · Grok 4 | SimpleBench | 60.5 | 13 / 61 | In Quality Score |
| xAI Grok 4 · 4.20 Beta1 (Reasoning) | Arena Elo | 1475 | 14 / 158 | In Quality Score |
| xAI Grok 4 · Grok 4.2 | Humanity's Last Exam · hle | 31.6 | 16 / 90 | In Quality Score |
| xAI Grok 4 · Grok 4.2 | GPQA Diamond | 88.5 | 17 / 143 | In Quality Score |
| Grok 4 Fast · Non-Thinking | SimpleBench | 56 | 17 / 61 | In Quality Score |
| xAI Grok 4 · 4.20 Multi-Agent | Arena Elo | 1472 | 18 / 158 | In Quality Score |
| xAI Grok 4 · 4.1 Thinking | Arena Elo | 1466 | 23 / 158 | In Quality Score |
| xAI Grok 4 · Grok 4 | GPQA Diamond | 87.5 | 23 / 143 | In Quality Score |
| xAI Grok 4 · 4.1 | Arena Elo | 1460 | 26 / 158 | In Quality Score |
| Grok 3 Mini · Grok 3 Mini | AIME 2025 | 83 | 28 / 88 | In Quality Score |
| xAI Grok 4 · Grok 4 | Humanity's Last Exam · hle | 25.4 | 28 / 90 | In Quality Score |
| xAI Grok 4 · Grok 4 | Humanity's Last Exam · tools | 41 | 29 / 38 | In Quality Score |
| xAI Grok 3 · Thinking | AIME 2025 | 77.3 | 34 / 88 | In Quality Score |
| Grok 4 Fast · Non-Thinking | GPQA Diamond | 84.3 | 38 / 143 | In Quality Score |
| Grok 4 Fast · Thinking | GPQA Diamond | 84.3 | 39 / 143 | In Quality Score |
| xAI Grok 3 · Thinking | SimpleBench | 36.1 | 39 / 61 | In Quality Score |
| xAI Grok 4 · Grok 4.3 | Arena Elo | 1447 | 40 / 158 | In Quality Score |
| xAI Grok 4 · 4.20 Beta1 | LiveBench | 68.0 | 42 / 110 | In Quality Score |
| xAI Grok 4 · Grok 4.3 | LiveBench | 66.7 | 48 / 110 | In Quality Score |
| Grok 4 Fast · Non-Thinking | Humanity's Last Exam · hle | 17.6 | 48 / 90 | In Quality Score |
| Grok 4 Fast · Thinking | Humanity's Last Exam · hle | 17.6 | 49 / 90 | In Quality Score |
| xAI Grok 3 · Thinking | GPQA Diamond | 80.2 | 53 / 143 | In Quality Score |
| xAI Grok 4 · Grok 4 | LiveBench | 62.0 | 53 / 110 | In Quality Score |
| Grok 4 Fast · Thinking | Arena Elo | 1431 | 55 / 158 | In Quality Score |
| Grok 3 Mini · Grok 3 Mini | GPQA Diamond | 79 | 57 / 143 | In Quality Score |
| Grok 3 Mini · Grok 3 Mini | Humanity's Last Exam · hle | 11 | 61 / 90 | In Quality Score |
| Grok 4 Fast · Non-Thinking | Arena Elo | 1421 | 65 / 158 | In Quality Score |
| Grok 4 Fast · Thinking | LiveBench | 60.0 | 65 / 110 | In Quality Score |
| xAI Grok 3 · Thinking | Arena Elo | 1412 | 78 / 158 | In Quality Score |
| xAI Grok 4 · Grok 4 | Arena Elo | 1410 | 83 / 158 | In Quality Score |
| Grok Code Fast · Non-thinking | LiveBench | 45.1 | 93 / 110 | In Quality Score |
| xAI Grok 4 · 4.20 Beta1 (Non-Thinking) | LiveBench | 39.7 | 100 / 110 | In Quality Score |
| Grok 4 Fast · Non-Thinking | LiveBench | 33.5 | 103 / 110 | In Quality Score |
| xAI Grok 4 · Grok 4 | MATH 500 | 99 | 1 / 55 | Tracked evidence |
| xAI Grok 4 · Grok 4 | AIME 2024 | 94.3 | 1 / 69 | Tracked evidence |
| xAI Grok 4 · Grok 4 | HMMT Feb 2025 · python | 93.9 | 3 / 6 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | HealthBench · hard | 20.3 | 4 / 5 | Tracked evidence |
| xAI Grok 3 · Thinking | MRCR · v2_average | 34 | 5 / 6 | Tracked evidence |
| xAI Grok 3 · Thinking | SimpleQA | 43.6 | 7 / 40 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | FACTS Benchmark Suite | 42.1 | 7 / 12 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | HMMT Feb 2025 | 93.3 | 8 / 44 | Tracked evidence |
| xAI Grok 3 · Thinking | MMMU · mmmu_single | 76 | 8 / 22 | Tracked evidence |
| Grok 4 Fast · Thinking | FACTS Benchmark Suite | 42.1 | 8 / 12 | Tracked evidence |
| xAI Grok 4 · Grok 4 | SciCode | 45.7 | 9 / 24 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | MMMLU | 86.8 | 10 / 38 | Tracked evidence |
| Grok 4 Fast · Thinking | MMMLU | 86.8 | 11 / 38 | Tracked evidence |
| xAI Grok 3 · Thinking | AIME 2024 | 83.9 | 12 / 69 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | MRCR · v2_1m | 6.1 | 12 / 14 | Tracked evidence |
| Grok 4 Fast · Thinking | MRCR · v2_1m | 6.1 | 13 / 14 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | MRCR · v2_128k | 54.6 | 14 / 23 | Tracked evidence |
| xAI Grok 4 · Grok 4 | HMMT Feb 2025 | 90 | 15 / 44 | Tracked evidence |
| Grok 4 Fast · Thinking | MRCR · v2_128k | 54.6 | 15 / 23 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | BrowseComp_zh | 51.2 | 15 / 20 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | Global PIQA | 85.6 | 16 / 26 | Tracked evidence |
| xAI Grok 4 · Grok 4 | BFCL v3 | 66.2 | 19 / 49 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | MMMU PRO | 75.2 | 20 / 52 | Tracked evidence |
| xAI Grok 4 · Grok 4 | IMO AnswerBench | 73.1 | 23 / 28 | Tracked evidence |
| Grok 3 Mini · Grok 3 Mini | HMMT Feb 2025 | 74 | 25 / 44 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | SimpleQA | 19.5 | 25 / 40 | Tracked evidence |
| Grok 4 Fast · Thinking | SimpleQA | 19.5 | 26 / 40 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | BrowseComp | 44.9 | 29 / 51 | Tracked evidence |
| xAI Grok 4 · Grok 4 | BrowseComp | 32.6 | 35 / 51 | Tracked evidence |
| Grok 4 Fast · Non-Thinking | MMMU PRO | 63 | 36 / 52 | Tracked evidence |
| Grok 4 Fast · Thinking | MMMU PRO | 63 | 37 / 52 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| xAI Grok 4 · Grok 4 | LiveCodeBench · 2024_07_2025_01 | 81.9 | 1 / 8 | In Quality Score |
| Grok 4 Fast · Non-Thinking | LiveCodeBench · 2025_01_2025_05_single | 80 | 2 / 11 | In Quality Score |
| xAI Grok 3 · Thinking | LiveCodeBench · 2024_single | 70.6 | 2 / 2 | In Quality Score |
| xAI Grok 4 · Grok 4 | LiveCodeBench · 2025_01_2025_05_single | 79 | 3 / 11 | In Quality Score |
| xAI Grok 4 · Grok 4.2 | LiveCodeBench · pro | 74.2 | 4 / 5 | In Quality Score |
| xAI Grok 4 · Grok 4 | Aider (Polyglot) | 79.6 | 5 / 45 | In Quality Score |
| Grok 4 Fast · Thinking | LiveCodeBench | 76.5 | 7 / 69 | In Quality Score |
| Grok 3 Mini · Grok 3 Mini | LiveCodeBench · 2025_01_2025_05_single | 70 | 8 / 11 | In Quality Score |
| xAI Grok 3 · Thinking | LiveCodeBench | 70.6 | 12 / 69 | In Quality Score |
| xAI Grok 4 · Grok 4.2 | SWE-bench Verified | 76.7 | 21 / 68 | In Quality Score |
| xAI Grok 3 · Thinking | Aider (Polyglot) | 53.3 | 24 / 45 | In Quality Score |
| Grok 4 Fast · Non-Thinking | SWE-bench Verified | 50.6 | 60 / 68 | In Quality Score |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| xAI Grok 4 · Grok 4.2 | τ²-bench · telecom | 96.5 | 10 / 28 | In Quality Score |
| xAI Grok 4 · Grok 4 | τ²-bench · airline | 58.4 | 14 / 29 | In Quality Score |
| xAI Grok 4 · Grok 4 | τ²-bench · retail | 76.5 | 17 / 34 | In Quality Score |
| Grok 4 Fast · Non-Thinking | VendingBench · v2 | 1107 | 5 / 7 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | DeepSearchQA | 62.8 | 7 / 7 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | GDPVal-AA | 1055 | 17 / 17 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| xAI Grok 4 · Grok 4.2 | MedXpertQA · text | 50.2 | 5 / 5 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | MedXpertQA · mm | 65.8 | 8 / 31 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | ZEROBench | 9 | 10 / 27 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | ERQA | 54.1 | 12 / 27 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | SimpleVQA | 57.4 | 14 / 29 | Tracked evidence |
| Grok 4 Fast · Thinking | Video-MMMU | 74.6 | 21 / 28 | Tracked evidence |
| xAI Grok 4 · Grok 4.2 | CharXiv Reasoning | 60.9 | 36 / 48 | Tracked evidence |
| Grok 4 Fast · Thinking | CharXiv Reasoning | 31.6 | 48 / 48 | Tracked evidence |
Where this family sits in the market
Grok 4.20 reasoning is the family's benchmark outlier and 4.20 multi-agent is the context outlier, while Grok 4.3 is the current general and coding default. Fast, Code Fast, and Grok 3 are legacy comparison rows.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
The Grok family
Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.
Closed · API only (5)
- xAI Grok 49 variants
- Grok 4 Fast2 variants
- xAI Grok 31 variant
- Grok 3 Mini1 variant
- Grok Code Fast1 variant
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- GPT-5: GPT-5.5 Thinking, Mini, Nano, Codex Compared
GPT-5: GPT-5.5 Thinking ranks #4 of 186 with 400K-token context and $1.25/$10 per 1M tokens. Compare GPT-5, Mini, Nano, and Codex by workload.
- Claude: Opus 4.8 (Thinking), Opus, Sonnet, Haiku Compared
Claude: Opus 4.8 (Thinking) ranks #2 of 186 on Quality Score. Compare Opus, Sonnet, Haiku, and Mythos by price, benchmarks, and workload.
- Gemini 3: Gemini 3.1 Pro, Flash, Lite Compared
Gemini 3: Gemini 3.1 Pro ranks #5 of 186 with $2/$12 per 1M tokens. Compare Gemini 3 Pro, Flash, and Lite by workload.
Caveats
What this page does not tell you, listed honestly.
- No tracked API pricing for: Grok 4 Fast, xAI Grok 3, Grok 3 Mini, Grok Code Fast. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
- Context window not declared for: Grok 4 Fast, xAI Grok 3, Grok 3 Mini, Grok Code Fast.
- Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.
Editor's notes
Why this family changed
xAI's May 2026 retirement moved the Grok decision surface. Grok 4 Fast, Grok Code Fast, and Grok 3 are no longer the active recommendations; they remain in our table because historical benchmarks and pinned deployments still need a place to land.
The current decision is mostly Grok 4.3 vs Grok 4.20. Grok 4.3 is the default route for general and coding workloads at $1.25 input /
$2.5 output per million with a 1M window. Grok 4.20 splits into two jobs: reasoning/non-reasoning routes with a 1M window, and the multi-agent route with a
2Mwindow at $1.25 / $2.5 per million.
Which route to start with
Default to x-ai-grok-4 / 4.3 for chat, summarization, coding agents,
and tool-augmented assistants. xAI names Grok 4.3 as the replacement for Grok
Code Fast and the old reasoning Fast route, and its pricing is now the same
headline rate as 4.20 in our index.
Use x-ai-grok-4 / 4.20-multi-agent when the 2M-token context window is
the binding requirement. Use 4.20-beta1-non-thinking when you explicitly
need the non-reasoning migration path. These are the routes to test for large
documents, long agent traces, or RAG systems where context size changes the
architecture.
When to deviate:
- You are still pinned to Grok 4 Fast or Code Fast: treat those rows as migration context, not fresh recommendations. xAI's retirement guide points reasoning and coding workloads to 4.3, and non-reasoning Fast workloads to 4.20 non-reasoning.
- You are on Grok 3 or Grok 3 Mini: the old rows stay visible for historical measurements, but the practical migration target is Grok 4.3 unless your contract or eval suite blocks the move.
- Hardest-tier reasoning workloads: 4.20 reasoning is currently the strongest Grok row by composite Quality Score in our payload (100.7, #10 of 186 models we track). Run it against Claude Opus, Gemini Pro, and GPT-5 on your specific workload before treating the family rank as the answer.
Where the data is weak
We aggregate benchmark scores from multiple sources but coverage is uneven across this family. Specifically:
- Grok 4 has many minor versions in our index. Scores for
grok-4, 4.2, 4.3, and the 4.20 variants are not interchangeable. When this article quotes a number it is for the specific variant named. - 4.3 has thinner benchmark coverage than older Grok rows. It is the current API route, but some benchmark tables still make 4.20 or older rows look stronger because they have different coverage.
- Fast and Code Fast are historical rows now. They remain useful for migration and benchmark provenance, but they should not be read as active buying advice after the May 2026 retirement.
- Pricing on this page is the published xAI list price. Volume agreements and higher-context pricing can change the unit economics; cross-check xAI's docs before procurement.
If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against xAI's own docs before you commit.
When to reach for which alternative
- Open-weights deployment is a requirement: Grok is API-only. The conversation moves to open-weights families (Qwen3, DeepSeek). On long-context cost as the binding axis, DeepSeek V4 Flash (1M context at $0.098 / $0.197 with QS 78.1) is the closest open-weights price anchor; Grok 4.20 multi-agent still wins on context size.
- Closed-flagship reasoning at the absolute top: Claude Opus 4.7-thinking (QS 107.8), Gemini 3 Pro 3.1 (QS 104.3), and full GPT-5 are the anchors to compare against on the specific benchmark that matters. On any given benchmark the ranking can flip; the price gap between Grok 4 and these is small enough that the choice often comes down to ecosystem.
- You are already paying for an OpenAI or Anthropic key: the case for adding Grok is workload-specific, not blanket. The strongest single reason is the 4.20 multi-agent context window or a measured win from 4.3 on your agentic workflow.
Sources worth reading
- xAI Grok 4.3 docs: current default route, pricing, context, and aliases
- xAI May 15 model retirement guide: retired Grok slugs and recommended replacements
- xAI Grok 4.20 docs: 4.20 context, pricing, and aliases
- xAI announcements: release notes for new generations and pricing changes
- Grok on the OpenRouter index: cross-checked pricing and provider availability
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Grok update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →