Qwen family
Qwen3
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.
Top in this family
Qwen 3.7 Max Preview ranks #9 of 186 on overall quality (QS 100.9) at $0.78/$3.9 per 1M tokens.
Practical pick
Qwen 3.6 35B-A3B (Thinking) at $0.14/$1 per 1M tokens (rank #54 of 186).
- Variants
- 30
- License
- Open weights
- Provider
- Qwen
★ Most teams should start here
Qwen 3.6 35B-A3B
Variant: Thinking
The family's value champion. Mixture-of-experts: 35B total parameters, only ~3B active per token, so it costs and serves like a small model on capable hardware. Pick this unless you have a specific reason not to.
- Quality Score
- 82.0
- Input
- $0.140/1M
- Output
- $1.00/1M
- Context
- 262K
- License
- Open weights
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| Coding agents | Qwen 3 Coder 480B A35B Instruct Non-thinking $0.220/1M / $1.80/1M | Purpose-built coder variant. Strongest coding-focused option in the family. Use when agentic coding throughput matters more than the price gap to the value pick. |
| General API workhorse | Qwen 3.6 35B-A3B Thinking $0.140/1M / $1.00/1M | Best quality-per-dollar for chat-and-tooling at API scale. The MoE active-param footprint means you pay closer to a 3B model than a 35B one, on most providers' billing. |
| Long-context RAG | Qwen 3.6 Plus Thinking $0.325/1M / $1.95/1M | Largest context window in the family. Prefer when document scale dominates the workload and recall over long inputs is the binding constraint. |
| Self-host on 1 GPU | Qwen 3.6 35B-A3B Thinking $0.140/1M / $1.00/1M | Mixture-of-experts means the active-param compute footprint is closer to a 3B model than a 35B one. The trade-off is memory: total weights still need to fit, so plan for the full parameter count when sizing GPU memory, not the active subset. |
| Edge / on-device | Qwen3-1.7B (Thinking) Non-thinking | Smallest open-weights variant with usable quality. Fits CPU + small-GPU inference for local or on-device deployment when round-trip latency rules out hosted APIs. |
| Document AI / OCR | Qwen3 VL 32B Thinking $0.104/1M / $0.416/1M | VL variant in the mid-size band. Vision-language coverage is strong enough for layout-aware OCR and table extraction. |
All variants
56 variants across 30 models. Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | SWE-Pro | Terminal | Tau | MCP | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Qwen 3.7 Max Preview Qwen3 Max | 100.9 #9/186 | — | — | — | — | — | — | — | — | $0.78 | $3.9 | 262K | — | |
Qwen 3.6 Max Preview Qwen3 Max | 96.6 #17/186 | — | — | — | — | — | — | — | — | $1.04 | $6.24 | 262K | — | |
Thinking 3.6 Plus | 92.6 #26/186 | 90.4 | 28.8 | — | 56.6 | 61.6 | — | — | — | $0.325 | $1.95 | 1.0M | — | |
Qwen3 Max Qwen3 Max | 89.4 #30/186 | — | 30.2 | 75.3 | — | — | — | — | — | $0.78 | $3.9 | 262K | — | |
Thinking 3.6 27B | 87.5 #35/186 | 87.8 | 24.0 | 77.2 | 53.5 | — | — | — | — | $0.29 | $3.2 | 262K | Apr 29, 2026 | |
Thinking 3.6 35B-A3B | 82.0 #54/186 | 86.0 | 21.4 | 73.4 | 49.5 | — | — | 62.8 | — | $0.14 | $1 | 262K | Apr 29, 2026 | |
Non-thinking 3 Coder 480B A35B Instruct | 64.9 #137/186 | — | — | — | 38.7 | 23.9 | — | — | — | $0.22 | $1.8 | 1.0M | Jul 22, 2025 | |
Thinking 3.6 Flash | — | — | — | — | — | — | — | — | — | $0.188 | $1.125 | 1.0M | — | |
Qwen 3.5 Max Preview Qwen3 Max | — | — | — | — | — | — | — | — | — | $0.78 | $3.9 | 262K | — | |
Thinking Qwen3 VL 2B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Thinking Qwen3 VL 4B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Thinking Qwen3 VL 8B | — | — | — | — | — | — | — | — | — | $0.117 | $1.365 | 256K | — | |
Non-Thinking Qwen3 VL 8B | — | — | — | — | — | — | — | — | — | $0.08 | $0.5 | 256K | — | |
Thinking Qwen3 VL 30B A3B | — | — | — | — | — | — | — | — | — | $0.13 | $1.56 | 131K | — | |
Thinking Qwen3 VL 32B | — | — | — | — | — | — | — | — | — | $0.104 | $0.416 | 262K | — | |
Non-Thinking Qwen3 VL 32B | — | — | — | — | — | — | — | — | — | $0.104 | $0.416 | 262K | — | |
Thinking Qwen3 VL 235B A22B | — | — | — | — | — | — | — | — | — | $0.26 | $2.6 | 131K | — | |
Non-ThinkingPrevious 3.5 122b A10b | 92.7 #24/186 | — | 25.3 | — | — | — | — | — | — | $0.26 | $2.08 | 262K | — | |
ThinkingPrevious 3.5 397b A17b | 92.7 #25/186 | — | 28.7 | 76.4 | — | — | — | — | 83.1 | $0.39 | $2.34 | 262K | — | |
ThinkingPrevious 3.5 122b A10b | 86.4 #40/186 | 86.6 | 25.3 | 72.0 | — | — | — | — | — | $0.26 | $2.08 | 262K | — | |
ThinkingPrevious 3.5 27b Newer: Qwen 3.6 27B | 84.7 #46/186 | 85.5 | 24.3 | 75.0 | — | — | — | — | — | $0.195 | $1.56 | 262K | — | |
ThinkingPrevious 3 30b A3b Newer: Qwen 3.5 35b A3b | 83.1 #51/186 | 65.8 | — | — | — | — | — | — | 70.9 | $0.08 | $0.4 | 131K | Apr 29, 2025 | |
ThinkingPrevious 3.5 9B | 81.9 #55/186 | 81.7 | — | — | — | — | — | — | — | $0.04 | $0.15 | 262K | — | |
ThinkingPrevious 3 14b | 81.0 #60/186 | 64.0 | — | — | — | — | — | — | 70.4 | $0.1 | $0.24 | 132K | Apr 29, 2025 | |
ThinkingPrevious 3.5 35b A3b Newer: Qwen 3.6 35B-A3B | 80.8 #62/186 | 84.2 | 22.4 | 69.2 | — | — | — | — | — | $0.14 | $1 | 262K | — | |
ThinkingPrevious Qwen3 Next 80B A3B | 80.4 #66/186 | 77.2 | — | — | — | — | — | — | — | $0.098 | $0.78 | 262K | Sep 11, 2025 | |
ThinkingPrevious 3 8b | 77.8 #81/186 | 62.0 | — | — | — | — | — | — | 67.3 | $0.05 | $0.4 | 131K | Apr 29, 2025 | |
ThinkingPrevious 3.5 4B | 75.4 #91/186 | 76.2 | — | — | — | — | — | — | — | — | — | — | — | |
thinking-2507Previous 3 235b A22b Newer: Qwen 3.5 397b A17b | 75.4 #92/186 | — | 15.8 | — | — | — | 71.9 | — | — | $0.1 | $0.1 | 262K | Apr 29, 2025 | |
ThinkingPrevious 3 235b A22b Newer: Qwen 3.5 397b A17b | 72.5 #105/186 | 71.1 | 7.6 | 34.4 | 21.4 | — | 58.6 | — | 81.5 | $0.455 | $1.82 | 131K | Apr 29, 2025 | |
ThinkingPrevious 3 32b | 72.5 #106/186 | 68.4 | — | — | — | — | — | — | 72.9 | $0.08 | $0.28 | 131K | Apr 29, 2025 | |
ThinkingPrevious Qwen3-4B (Thinking) | 72.5 #107/186 | 55.9 | — | — | — | — | — | — | 65.6 | — | — | — | Apr 29, 2025 | |
non-thinking-2507Previous 3 235b A22b Newer: Qwen 3.5 397b A17b | 70.8 #113/186 | — | — | — | — | — | — | — | — | $0.071 | $0.1 | 262K | Apr 29, 2025 | |
Thinking (2507)Previous 3 30b A3b Newer: Qwen 3.5 35b A3b | 68.4 #124/186 | 73.4 | 9.8 | 22.0 | — | — | — | — | 85.0 | $0.09 | $0.45 | 131K | Apr 29, 2025 | |
Non-thinkingPrevious 3 14b | 67.2 #127/186 | 54.8 | — | — | — | — | — | — | 23.3 | $0.1 | $0.24 | 132K | Apr 29, 2025 | |
Non-thinkingPrevious 3 32b | 66.3 #133/186 | 54.6 | — | — | — | — | — | — | 20.2 | $0.08 | $0.28 | 131K | Apr 29, 2025 | |
Non-thinkingPrevious 3 30b A3b Newer: Qwen 3.5 35b A3b | 65.4 #135/186 | 54.8 | — | — | — | — | — | — | 21.6 | $0.09 | $0.45 | 131K | Apr 29, 2025 | |
Non-ThinkingPrevious 3 235b A22b Newer: Qwen 3.5 397b A17b | 64.0 #140/186 | 62.9 | — | 34.4 | — | — | 57.0 | — | 24.7 | $0.455 | $1.82 | 131K | Apr 29, 2025 | |
ThinkingPrevious Qwen3-1.7B (Thinking) | 62.8 #146/186 | 40.1 | — | — | — | — | — | — | 36.8 | — | — | — | Apr 29, 2025 | |
Non-thinkingPrevious 3 8b | 60.9 #152/186 | 39.3 | — | — | — | — | — | — | 20.9 | $0.05 | $0.4 | 131K | Apr 29, 2025 | |
Non-thinkingPrevious Qwen3-4B (Thinking) | 60.0 #158/186 | 41.7 | — | — | — | — | — | — | 19.1 | — | — | — | Apr 29, 2025 | |
ThinkingPrevious Qwen3-0.6B (Thinking) | 50.0 #177/186 | 27.9 | — | — | — | — | — | — | 15.1 | — | — | — | Apr 29, 2025 | |
Non-thinkingPrevious Qwen3-1.7B (Thinking) | 48.0 #180/186 | 28.6 | — | — | — | — | — | — | 9.8 | — | — | — | Apr 29, 2025 | |
Non-thinkingPrevious Qwen3-0.6B (Thinking) | 37.2 #185/186 | 22.9 | — | — | — | — | — | — | 2.6 | — | — | — | Apr 29, 2025 | |
Non-Thinking (2507)Previous 3 30b A3b Newer: Qwen 3.5 35b A3b | — | — | — | — | — | — | — | — | — | $0.043 | $0.172 | 131K | Apr 29, 2025 | |
vl-235b-a22b-instructPrevious 3 235b A22b Newer: Qwen 3.5 397b A17b | — | — | — | — | — | — | — | — | — | $0.455 | $1.82 | 131K | Apr 29, 2025 | |
vl-235b-a22b-thinkingPrevious 3 235b A22b Newer: Qwen 3.5 397b A17b | — | — | — | — | — | — | — | — | — | $0.455 | $1.82 | 131K | Apr 29, 2025 | |
ThinkingPrevious 3.5 0.8B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Non-ThinkingPrevious 3.5 0.8B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
ThinkingPrevious 3.5 2B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Non-ThinkingPrevious 3.5 2B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Non-ThinkingPrevious 3.5 4B | — | — | — | — | — | — | — | — | — | — | — | — | — | |
Non-ThinkingPrevious 3.5 9B | — | — | — | — | — | — | — | — | — | $0.04 | $0.15 | 262K | — | |
Non-ThinkingPrevious 3.5 27b Newer: Qwen 3.6 27B | — | — | — | — | — | — | — | — | — | $0.195 | $1.56 | 262K | — | |
Non-ThinkingPrevious 3.5 35b A3b Newer: Qwen 3.6 35B-A3B | — | — | — | — | — | — | — | — | — | $0.14 | $1 | 262K | — | |
ThinkingPrevious 3.5 Flash Newer: Qwen 3.6 Flash | — | — | — | — | — | — | — | — | — | $0.065 | $0.26 | 1.0M | — |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (203 of 1031 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.6 Plus · Thinking | MCP Atlas · public_set | 74.1 | 1 / 13 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | MMLU Pro | 87.8 | 3 / 86 | In Quality Score |
| Qwen 3.5 122b A10b · Non-Thinking | LiveCodeBench · v5 | 78.9 | 3 / 5 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | LiveCodeBench · 2024_07_2025_01 | 78.2 | 4 / 8 | In Quality Score |
| Qwen3 Max · Qwen 3.7 Max Preview | SimpleBench | 70.4 | 4 / 61 | In Quality Score |
| Qwen 3 235b A22b · Thinking | LiveCodeBench · 2024_08_2025_05 | 66.5 | 4 / 17 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | Humanity's Last Exam · verified | 37.6 | 4 / 5 | In Quality Score |
| Qwen3 Max · Qwen3 Max | LiveCodeBench · v6 | 85.9 | 5 / 40 | In Quality Score |
Show all benchmark evidence (1031 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.5 397b A17b · Thinking | MMLU Pro | 87.8 | 3 / 86 | In Quality Score |
| Qwen3 Max · Qwen 3.7 Max Preview | SimpleBench | 70.4 | 4 / 61 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | Humanity's Last Exam · verified | 37.6 | 4 / 5 | In Quality Score |
| Qwen3 Max · Qwen3 Max | Humanity's Last Exam · verified | 37.6 | 5 / 5 | In Quality Score |
| Qwen 3 235b A22b · Thinking | LiveBench | 77.1 | 6 / 110 | In Quality Score |
| Qwen 3.5 122b A10b · Non-Thinking | AIME 2025 · no_tools | 90.4 | 7 / 15 | In Quality Score |
| Qwen3 Max · Qwen 3.6 Max Preview | SimpleBench | 63 | 8 / 61 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | MMLU Pro | 86.7 | 10 / 86 | In Quality Score |
| Qwen 3.5 122b A10b · Non-Thinking | MMLU Pro | 86.7 | 11 / 86 | In Quality Score |
| Qwen 3.6 Plus · Thinking | Humanity's Last Exam · tools | 50.6 | 11 / 38 | In Quality Score |
| Qwen3 Max · Qwen 3.7 Max Preview | Arena Elo | 1475 | 13 / 158 | In Quality Score |
| Qwen 3.6 Plus · Thinking | GPQA Diamond | 90.4 | 13 / 143 | In Quality Score |
| Qwen 3 32b · Thinking | LiveBench | 74.9 | 14 / 110 | In Quality Score |
| Qwen3 Max · Qwen3 Max | Humanity's Last Exam · tools | 49.8 | 14 / 38 | In Quality Score |
| Qwen 3.6 27B · Thinking | MMLU Pro | 86.2 | 15 / 86 | In Quality Score |
| Qwen 3.5 27b · Thinking | MMLU Pro | 86.1 | 16 / 86 | In Quality Score |
| Qwen 3.5 27b · Thinking | Humanity's Last Exam · tools | 48.5 | 16 / 38 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | Humanity's Last Exam · tools | 48.3 | 17 / 38 | In Quality Score |
| Qwen3 Max · Qwen3 Max | MMLU Pro | 85.7 | 18 / 86 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | MMLU Pro | 85.3 | 19 / 86 | In Quality Score |
| Qwen 3 30b A3b · Thinking | LiveBench | 74.3 | 19 / 110 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | Humanity's Last Exam · tools | 47.5 | 19 / 38 | In Quality Score |
| Qwen3 Max · Qwen 3.7 Max Preview | LiveBench | 74.3 | 20 / 110 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | Humanity's Last Exam · tools | 47.4 | 20 / 38 | In Quality Score |
| Qwen 3.6 27B · Thinking | GPQA Diamond | 87.8 | 21 / 143 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | MMLU Pro | 85.2 | 21 / 86 | In Quality Score |
| Qwen3 Max · Qwen3 Max | Humanity's Last Exam · hle | 30.2 | 21 / 90 | In Quality Score |
| Qwen3 Max · Qwen 3.5 Max Preview | Arena Elo | 1466 | 22 / 158 | In Quality Score |
| Qwen 3.6 Plus · Thinking | Humanity's Last Exam · hle | 28.8 | 22 / 90 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | Humanity's Last Exam · hle | 28.7 | 23 / 90 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | AIME 2025 | 85 | 24 / 88 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | Humanity's Last Exam · hle_text | 15.4 | 24 / 56 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | AIME 2025 | 83.1 | 26 / 88 | In Quality Score |
| Qwen3 Max · Qwen 3.6 Max Preview | Arena Elo | 1459 | 27 / 158 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | GPQA Diamond | 86.6 | 27 / 143 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | MMLU Pro | 84.5 | 29 / 86 | In Quality Score |
| Qwen 3 235b A22b · Thinking | AIME 2025 | 81.5 | 29 / 88 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | Humanity's Last Exam · hle | 25.3 | 29 / 90 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | GPQA Diamond | 86 | 30 / 143 | In Quality Score |
| Qwen 3.5 122b A10b · Non-Thinking | Humanity's Last Exam · hle | 25.3 | 30 / 90 | In Quality Score |
| Qwen 3 14b · Thinking | LiveBench | 71.3 | 31 / 110 | In Quality Score |
| Qwen 3.6 Plus · Thinking | LiveBench | 70.8 | 32 / 110 | In Quality Score |
| Qwen 3.5 27b · Thinking | GPQA Diamond | 85.5 | 34 / 143 | In Quality Score |
| Qwen 3 235b A22b · Thinking | MMLU Pro | 83 | 35 / 86 | In Quality Score |
| Qwen 3.5 27b · Thinking | Humanity's Last Exam · hle | 24.3 | 35 / 90 | In Quality Score |
| Qwen 3.6 27B · Thinking | Humanity's Last Exam · hle | 24 | 36 / 90 | In Quality Score |
| Qwen3 Next 80B A3B · Thinking | MMLU Pro | 82.7 | 37 / 86 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | Humanity's Last Exam · hle | 22.4 | 37 / 90 | In Quality Score |
| Qwen 3 32b · Thinking | AIME 2025 | 72.9 | 38 / 88 | In Quality Score |
| Qwen 3.5 9B · Thinking | MMLU Pro | 82.5 | 39 / 86 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | GPQA Diamond | 84.2 | 40 / 143 | In Quality Score |
| Qwen 3 30b A3b · Thinking | AIME 2025 | 70.9 | 40 / 88 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | Humanity's Last Exam · hle | 21.4 | 40 / 90 | In Quality Score |
| Qwen 3 14b · Thinking | AIME 2025 | 70.4 | 42 / 88 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | Arena Elo | 1445 | 43 / 158 | In Quality Score |
| Qwen 3.6 Plus · Thinking | Arena Elo | 1444 | 44 / 158 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | LiveBench | 67.6 | 44 / 110 | In Quality Score |
| Qwen 3 8b · Thinking | AIME 2025 | 67.3 | 45 / 88 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | AIME 2025 | 65.6 | 46 / 88 | In Quality Score |
| Qwen 3 8b · Thinking | LiveBench | 67.1 | 47 / 110 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | MMLU Pro | 80.9 | 48 / 86 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | Humanity's Last Exam · hle_text | 5.7 | 48 / 56 | In Quality Score |
| Qwen 3.5 9B · Thinking | GPQA Diamond | 81.7 | 49 / 143 | In Quality Score |
| Qwen 3.6 27B · Thinking | LiveBench | 65.6 | 49 / 110 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | LiveBench | 63.6 | 50 / 110 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | Humanity's Last Exam · hle | 15.8 | 52 / 90 | In Quality Score |
| Qwen 3.5 4B · Thinking | MMLU Pro | 79.1 | 54 / 86 | In Quality Score |
| Qwen3-1.7B (Thinking) · Thinking | AIME 2025 | 36.8 | 55 / 88 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | MMLU Pro | 77.3 | 56 / 86 | In Quality Score |
| Qwen3 Next 80B A3B · Thinking | GPQA Diamond | 77.2 | 60 / 143 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | MMLU Pro | 74 | 61 / 86 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | AIME 2025 | 24.7 | 61 / 88 | In Quality Score |
| Qwen 3 235b A22b · non-thinking-2507 | Arena Elo | 1423 | 62 / 158 | In Quality Score |
| Qwen 3.5 4B · Thinking | GPQA Diamond | 76.2 | 62 / 143 | In Quality Score |
| Qwen 3.6 Flash · Thinking | LiveBench | 60.4 | 62 / 110 | In Quality Score |
| Qwen 3 14b · Non-thinking | AIME 2025 | 23.3 | 63 / 88 | In Quality Score |
| Qwen3-4B (Thinking) · Non-thinking | MMLU Pro | 69.6 | 64 / 86 | In Quality Score |
| Qwen 3 30b A3b · Non-thinking | AIME 2025 | 21.6 | 64 / 88 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | Humanity's Last Exam · hle | 9.8 | 65 / 90 | In Quality Score |
| Qwen 3 32b · Non-thinking | LiveBench | 59.8 | 66 / 110 | In Quality Score |
| Qwen 3 8b · Non-thinking | AIME 2025 | 20.9 | 66 / 88 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | GPQA Diamond | 73.4 | 67 / 143 | In Quality Score |
| Qwen 3 14b · Non-thinking | LiveBench | 59.6 | 67 / 110 | In Quality Score |
| Qwen 3 32b · Non-thinking | AIME 2025 | 20.2 | 68 / 88 | In Quality Score |
| Qwen 3 30b A3b · Non-thinking | LiveBench | 59.4 | 69 / 110 | In Quality Score |
| Qwen3-4B (Thinking) · Non-thinking | AIME 2025 | 19.1 | 69 / 88 | In Quality Score |
| Qwen 3.5 2B · Thinking | MMLU Pro | 66.5 | 70 / 86 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | Arena Elo | 1417 | 71 / 158 | In Quality Score |
| Qwen3-0.6B (Thinking) · Thinking | AIME 2025 | 15.1 | 73 / 88 | In Quality Score |
| Qwen 3 235b A22b · vl-235b-a22b-instruct | Arena Elo | 1415 | 74 / 158 | In Quality Score |
| Qwen 3 235b A22b · Thinking | GPQA Diamond | 71.1 | 74 / 143 | In Quality Score |
| Qwen3-1.7B (Thinking) · Thinking | MMLU Pro | 56.5 | 76 / 86 | In Quality Score |
| Qwen 3 8b · Non-thinking | LiveBench | 53.5 | 76 / 110 | In Quality Score |
| Qwen 3.5 2B · Non-Thinking | MMLU Pro | 55.3 | 77 / 86 | In Quality Score |
| Qwen 3 235b A22b · Thinking | Humanity's Last Exam · hle | 7.6 | 77 / 90 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | LiveBench | 53.0 | 78 / 110 | In Quality Score |
| Qwen3-1.7B (Thinking) · Non-thinking | AIME 2025 | 9.8 | 80 / 88 | In Quality Score |
| Qwen3-1.7B (Thinking) · Thinking | LiveBench | 51.1 | 82 / 110 | In Quality Score |
| Qwen 3.5 0.8B · Thinking | MMLU Pro | 42.3 | 82 / 86 | In Quality Score |
| Qwen 3 32b · Thinking | GPQA Diamond | 68.4 | 83 / 143 | In Quality Score |
| Qwen 3.5 27b · Thinking | Arena Elo | 1408 | 84 / 158 | In Quality Score |
| Qwen3-1.7B (Thinking) · Non-thinking | MMLU Pro | 40.2 | 84 / 86 | In Quality Score |
| Qwen 3.5 0.8B · Non-Thinking | MMLU Pro | 29.7 | 85 / 86 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | Arena Elo | 1403 | 86 / 158 | In Quality Score |
| Qwen3-0.6B (Thinking) · Non-thinking | AIME 2025 | 2.6 | 87 / 88 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | Arena Elo | 1400 | 88 / 158 | In Quality Score |
| Qwen 3 235b A22b · non-thinking-2507 | LiveBench | 48.8 | 88 / 110 | In Quality Score |
| Qwen3-4B (Thinking) · Non-thinking | LiveBench | 48.4 | 89 / 110 | In Quality Score |
| Qwen 3 30b A3b · Thinking | GPQA Diamond | 65.8 | 90 / 143 | In Quality Score |
| Qwen 3 235b A22b · vl-235b-a22b-thinking | Arena Elo | 1396 | 91 / 158 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | Arena Elo | 1396 | 92 / 158 | In Quality Score |
| Qwen 3.5 Flash · Thinking | Arena Elo | 1396 | 93 / 158 | In Quality Score |
| Qwen 3 14b · Thinking | GPQA Diamond | 64 | 95 / 143 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | GPQA Diamond | 62.9 | 96 / 143 | In Quality Score |
| Qwen 3 8b · Thinking | GPQA Diamond | 62 | 98 / 143 | In Quality Score |
| Qwen 3 Coder 480B A35B Instruct · Non-thinking | Arena Elo | 1388 | 100 / 158 | In Quality Score |
| Qwen3-1.7B (Thinking) · Non-thinking | LiveBench | 35.6 | 101 / 110 | In Quality Score |
| Qwen 3 30b A3b · Non-Thinking (2507) | Arena Elo | 1384 | 104 / 158 | In Quality Score |
| Qwen3-0.6B (Thinking) · Thinking | LiveBench | 30.3 | 105 / 110 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | GPQA Diamond | 55.9 | 106 / 143 | In Quality Score |
| Qwen 3 14b · Non-thinking | GPQA Diamond | 54.8 | 107 / 143 | In Quality Score |
| Qwen 3 235b A22b · Thinking | Arena Elo | 1375 | 108 / 158 | In Quality Score |
| Qwen 3 30b A3b · Non-thinking | GPQA Diamond | 54.8 | 108 / 143 | In Quality Score |
| Qwen 3 32b · Non-thinking | GPQA Diamond | 54.6 | 109 / 143 | In Quality Score |
| Qwen3-0.6B (Thinking) · Non-thinking | LiveBench | 21.8 | 109 / 110 | In Quality Score |
| Qwen 3 32b · Non-thinking | Arena Elo | 1347 | 123 / 158 | In Quality Score |
| Qwen3-4B (Thinking) · Non-thinking | GPQA Diamond | 41.7 | 126 / 143 | In Quality Score |
| Qwen3-1.7B (Thinking) · Thinking | GPQA Diamond | 40.1 | 130 / 143 | In Quality Score |
| Qwen 3 8b · Non-thinking | GPQA Diamond | 39.3 | 131 / 143 | In Quality Score |
| Qwen 3 30b A3b · Non-thinking | Arena Elo | 1327 | 133 / 158 | In Quality Score |
| Qwen3-1.7B (Thinking) · Non-thinking | GPQA Diamond | 28.6 | 137 / 143 | In Quality Score |
| Qwen3-0.6B (Thinking) · Thinking | GPQA Diamond | 27.9 | 138 / 143 | In Quality Score |
| Qwen3-0.6B (Thinking) · Non-thinking | GPQA Diamond | 22.9 | 142 / 143 | In Quality Score |
| Qwen 3.5 27b · Thinking | IFBench | 76.5 | 1 / 28 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | HMMT Feb 2025 | 98 | 2 / 44 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | Arena-Hard | 96.1 | 2 / 40 | Tracked evidence |
| Qwen 3 235b A22b · thinking-2507 | AIME 2024 | 94.1 | 2 / 69 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MAXIFE | 88.2 | 2 / 21 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | IFBench | 76.5 | 2 / 28 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | Arena-Hard | 95.6 | 3 / 40 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MAXIFE | 88 | 3 / 21 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | WMT24++ | 78.9 | 3 / 6 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | IFBench | 76.1 | 3 / 28 | Tracked evidence |
| Qwen 3 14b · Thinking | Multi-IF | 74.8 | 3 / 32 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | BrowseComp_zh | 70.3 | 3 / 20 | Tracked evidence |
| Qwen 3 32b · Thinking | Arena-Hard | 93.8 | 4 / 40 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MAXIFE | 87.9 | 4 / 21 | Tracked evidence |
| Qwen 3 32b · Thinking | Multi-IF | 73 | 4 / 32 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | BrowseComp_zh | 69.9 | 4 / 20 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | HMMT Feb 2025 | 94.8 | 5 / 44 | Tracked evidence |
| Qwen 3 32b · Non-thinking | Arena-Hard | 92.8 | 5 / 40 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | HMMT Feb 2026 | 87.8 | 5 / 16 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | BrowseComp · context_manage | 78.6 | 5 / 15 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | Multi-IF | 72.2 | 5 / 32 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | BrowseComp_zh | 69.5 | 5 / 20 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | HMMT Nov 2025 | 94.7 | 6 / 31 | Tracked evidence |
| Qwen 3.6 27B · Thinking | HMMT Feb 2025 | 93.8 | 6 / 44 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MAXIFE | 86.6 | 6 / 21 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | IMO AnswerBench | 83.9 | 6 / 28 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | WMT24++ | 77.6 | 6 / 6 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | Multi-IF | 71.9 | 6 / 32 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | IFBench | 70.9 | 6 / 28 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | HMMT Nov 2025 | 94.6 | 7 / 31 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | IMO AnswerBench | 83.8 | 7 / 28 | Tracked evidence |
| Qwen 3 235b A22b · thinking-2507 | BFCL v3 | 71.9 | 7 / 49 | Tracked evidence |
| Qwen 3 32b · Non-thinking | Multi-IF | 70.7 | 7 / 32 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | AceBench | 70.5 | 7 / 7 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | IFBench | 70.2 | 7 / 28 | Tracked evidence |
| Qwen 3 14b · Thinking | Arena-Hard | 91.7 | 8 / 40 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | Global PIQA | 89.8 | 8 / 26 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMMLU | 88.5 | 8 / 38 | Tracked evidence |
| Qwen 3.6 27B · Thinking | HMMT Feb 2026 | 84.3 | 8 / 16 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | MAXIFE | 84 | 8 / 21 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | Multi-IF | 70.2 | 8 / 32 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | AIME 2026 | 95.1 | 9 / 19 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | Arena-Hard | 91 | 9 / 40 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | HMMT Feb 2026 | 83.6 | 9 / 16 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | BFCL v3 | 70.8 | 9 / 49 | Tracked evidence |
| Qwen 3 235b A22b · thinking-2507 | MATH 500 | 98 | 10 / 55 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MAXIFE | 83.4 | 10 / 21 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMMU PRO | 79 | 10 / 52 | Tracked evidence |
| Qwen 3 14b · Thinking | BFCL v3 | 70.4 | 10 / 49 | Tracked evidence |
| Qwen 3 8b · Non-thinking | Multi-IF | 69.2 | 10 / 32 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | MATH 500 | 98 | 11 / 55 | Tracked evidence |
| Qwen 3.6 27B · Thinking | AIME 2026 | 94.1 | 11 / 19 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | HMMT Nov 2025 | 92.7 | 11 / 31 | Tracked evidence |
| Qwen 3.5 27b · Thinking | HMMT Feb 2025 | 92 | 11 / 44 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | Global PIQA | 88.4 | 11 / 26 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | AIME 2024 | 85.7 | 11 / 69 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | MAXIFE | 83.2 | 11 / 21 | Tracked evidence |
| Qwen 3 32b · Thinking | BFCL v3 | 70.3 | 11 / 49 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | HMMT Feb 2025 | 91.4 | 12 / 44 | Tracked evidence |
| Qwen 3 30b A3b · Non-thinking | Arena-Hard | 88 | 12 / 40 | Tracked evidence |
| Qwen 3.5 27b · Thinking | Global PIQA | 87.5 | 12 / 26 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMMLU | 86.7 | 12 / 38 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | BFCL v3 | 69.1 | 12 / 49 | Tracked evidence |
| Qwen 3 8b · Thinking | MATH 500 | 97.4 | 13 / 55 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | AIME 2026 | 93.3 | 13 / 19 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | HMMT Feb 2025 | 90.7 | 13 / 44 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | Global PIQA | 86.6 | 13 / 26 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | MAXIFE | 79.9 | 13 / 21 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMMU PRO | 76.9 | 13 / 52 | Tracked evidence |
| Qwen3 VL 32B · Thinking | MMMU · mmmu_single | 72.2 | 13 / 22 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | BrowseComp | 69 | 13 / 51 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | Multi-IF | 66.3 | 13 / 32 | Tracked evidence |
| Qwen 3.5 27b · Thinking | BrowseComp_zh | 62.1 | 13 / 20 | Tracked evidence |
| Qwen 3.6 27B · Thinking | HMMT Nov 2025 | 90.7 | 14 / 31 | Tracked evidence |
| Qwen 3 14b · Non-thinking | Arena-Hard | 86.3 | 14 / 40 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | Global PIQA | 86 | 14 / 26 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | IMO AnswerBench | 80.9 | 14 / 28 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | MMMU · mmmu_single | 70.6 | 14 / 22 | Tracked evidence |
| Qwen 3 8b · Thinking | BFCL v3 | 68.1 | 14 / 49 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | BrowseComp_zh | 60.9 | 14 / 20 | Tracked evidence |
| Qwen 3 235b A22b · thinking-2507 | SciCode | 42.9 | 14 / 24 | Tracked evidence |
| Qwen 3 32b · Thinking | MATH 500 | 97.2 | 15 / 55 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | AIME 2026 | 92.7 | 15 / 19 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | HMMT Nov 2025 | 90.3 | 15 / 31 | Tracked evidence |
| Qwen 3 8b · Thinking | Arena-Hard | 85.8 | 15 / 40 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | Global PIQA | 85.7 | 15 / 26 | Tracked evidence |
| Qwen 3 32b · Thinking | AIME 2024 | 81.4 | 15 / 69 | Tracked evidence |
| Qwen 3.6 27B · Thinking | IMO AnswerBench | 80.8 | 15 / 28 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MAXIFE | 78 | 15 / 21 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | BFCL v3 | 68 | 15 / 49 | Tracked evidence |
| Qwen3 VL 8B · Thinking | MMMU · mmmu_single | 65.3 | 15 / 22 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | MATH 500 | 97 | 16 / 55 | Tracked evidence |
| Qwen 3.5 27b · Thinking | AIME 2026 | 92.6 | 16 / 19 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMMLU | 85.9 | 16 / 38 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | IMO AnswerBench | 78.9 | 16 / 28 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | MAXIFE | 77.4 | 16 / 21 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | MMMU · mmmu_single | 64.6 | 16 / 22 | Tracked evidence |
| Qwen 3.5 9B · Thinking | IFBench | 64.5 | 16 / 28 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | AIME 2026 | 91.3 | 17 / 19 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | HMMT Feb 2025 | 89 | 17 / 44 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMMLU | 85.2 | 17 / 38 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | AIME 2024 | 80.4 | 17 / 69 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MMMU PRO | 75.8 | 17 / 52 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | BrowseComp | 63.8 | 17 / 51 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | IFBench | 61.5 | 17 / 28 | Tracked evidence |
| Qwen 3 14b · Thinking | MATH 500 | 96.8 | 18 / 55 | Tracked evidence |
| Qwen 3.5 27b · Thinking | HMMT Nov 2025 | 89.8 | 18 / 31 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | Global PIQA | 83.5 | 18 / 26 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MMMU PRO | 75.3 | 18 / 52 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | MAXIFE | 72.1 | 18 / 21 | Tracked evidence |
| Qwen 3.5 4B · Thinking | IFBench | 59.2 | 18 / 28 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | HMMT Nov 2025 | 89.5 | 19 / 31 | Tracked evidence |
| Qwen 3.5 9B · Thinking | Global PIQA | 83.2 | 19 / 26 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | Multi-IF | 61.3 | 19 / 32 | Tracked evidence |
| Qwen 3.5 27b · Thinking | BrowseComp | 61 | 19 / 51 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MAXIFE | 60.6 | 19 / 21 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | MATH 500 | 96.2 | 20 / 55 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | HMMT Nov 2025 | 89.2 | 20 / 31 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | MMMLU | 84.4 | 20 / 38 | Tracked evidence |
| Qwen 3.5 9B · Thinking | HMMT Feb 2025 | 83.2 | 20 / 44 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | Global PIQA | 80.2 | 20 / 26 | Tracked evidence |
| Qwen 3 14b · Thinking | AIME 2024 | 79.3 | 20 / 69 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | BFCL v3 | 65.9 | 20 / 49 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | BrowseComp | 61 | 20 / 51 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | MAXIFE | 50.7 | 20 / 21 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | HMMT Nov 2025 | 89.1 | 21 / 31 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | MMLU | 87 | 21 / 33 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | MMMLU | 83.4 | 21 / 38 | Tracked evidence |
| Qwen 3 8b · Non-thinking | Arena-Hard | 79.6 | 21 / 40 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMMU PRO | 75.1 | 21 / 52 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MAXIFE | 39.2 | 21 / 21 | Tracked evidence |
| Qwen 3.5 4B · Thinking | Global PIQA | 78.9 | 22 / 26 | Tracked evidence |
| Qwen 3 8b · Thinking | AIME 2024 | 76 | 22 / 69 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMMU PRO | 75 | 22 / 52 | Tracked evidence |
| Qwen 3.5 9B · Thinking | HMMT Nov 2025 | 82.9 | 23 / 31 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | MMMLU | 81.3 | 23 / 38 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | Arena-Hard | 76.6 | 23 / 40 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | Global PIQA | 73.5 | 23 / 26 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | IFBench | 51.7 | 23 / 28 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | Multi-IF | 51.2 | 23 / 32 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | MATH 500 | 93.4 | 24 / 55 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMMLU | 81.2 | 24 / 38 | Tracked evidence |
| Qwen 3.5 4B · Thinking | HMMT Feb 2025 | 74 | 24 / 44 | Tracked evidence |
| Qwen 3.5 2B · Thinking | Global PIQA | 69.3 | 24 / 26 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | BrowseComp | 53.9 | 24 / 51 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | IFBench | 51.5 | 24 / 28 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | HMMT Nov 2025 | 81.2 | 25 / 31 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | MMMLU | 78.4 | 25 / 38 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | Global PIQA | 63.1 | 25 / 26 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | IFBench | 50.4 | 25 / 28 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | MATH 500 | 91.2 | 26 / 55 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | AIME 2024 | 73.8 | 26 / 69 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | HMMT Feb 2025 | 73.7 | 26 / 44 | Tracked evidence |
| Qwen 3 32b · Non-thinking | BFCL v3 | 63 | 26 / 49 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | Global PIQA | 59.4 | 26 / 26 | Tracked evidence |
| Qwen 3.5 2B · Thinking | IFBench | 41.3 | 26 / 28 | Tracked evidence |
| Qwen 3.5 4B · Thinking | HMMT Nov 2025 | 76.8 | 27 / 31 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | Multi-IF | 44.7 | 27 / 32 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | IFBench | 26.7 | 27 / 28 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMMLU | 76.1 | 28 / 38 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | HMMT Nov 2025 | 73.8 | 28 / 31 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMMU PRO | 70.1 | 28 / 52 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | HMMT Feb 2025 | 63.1 | 28 / 44 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | IFBench | 21 | 28 / 28 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | MMMLU | 70.8 | 29 / 38 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | HMMT Nov 2025 | 69.6 | 29 / 31 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MMMU PRO | 69.3 | 29 / 52 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | HMMT Feb 2025 | 62.5 | 29 / 44 | Tracked evidence |
| Qwen 3 14b · Non-thinking | BFCL v3 | 61.5 | 29 / 49 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Thinking | Multi-IF | 36.1 | 29 / 32 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | Arena-Hard | 66.2 | 30 / 40 | Tracked evidence |
| Qwen 3 8b · Non-thinking | BFCL v3 | 60.2 | 30 / 49 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | HMMT Feb 2025 | 57.5 | 30 / 44 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Non-thinking | Multi-IF | 33.3 | 30 / 32 | Tracked evidence |
| Qwen 3.5 2B · Thinking | HMMT Nov 2025 | 19.6 | 30 / 31 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | SimpleQA | 13.2 | 31 / 40 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | HMMT Nov 2025 | 8.9 | 31 / 31 | Tracked evidence |
| Qwen 3 14b · Non-thinking | MATH 500 | 90 | 32 / 55 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMMU PRO | 66.3 | 32 / 52 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | MMMLU | 64.9 | 32 / 38 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | Arena-Hard | 43.1 | 32 / 40 | Tracked evidence |
| Qwen 3 30b A3b · Non-thinking | MATH 500 | 89.8 | 33 / 55 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMMLU | 63.1 | 33 / 38 | Tracked evidence |
| Qwen 3 30b A3b · Non-thinking | BFCL v3 | 58.6 | 33 / 49 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | Arena-Hard | 36.9 | 33 / 40 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | SimpleQA | 11 | 33 / 40 | Tracked evidence |
| Qwen 3 32b · Non-thinking | MATH 500 | 88.6 | 34 / 55 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | MMMLU | 57 | 34 / 38 | Tracked evidence |
| Qwen 3 8b · Non-thinking | MATH 500 | 87.4 | 35 / 55 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MMMU PRO | 63 | 35 / 52 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | BFCL v3 | 57.6 | 35 / 49 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MMMLU | 56.9 | 35 / 38 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | AIME 2024 | 48.3 | 36 / 69 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | MMMLU | 46.7 | 36 / 38 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | MATH 500 | 84.8 | 37 / 55 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | BFCL v3 | 56.6 | 37 / 49 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMMLU | 44.3 | 37 / 38 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MMMLU | 34.1 | 38 / 38 | Tracked evidence |
| Qwen 3.5 2B · Thinking | HMMT Feb 2025 | 22.9 | 39 / 44 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Thinking | Arena-Hard | 8.5 | 39 / 40 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MMMU PRO | 57 | 40 / 52 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | BFCL v3 | 52.2 | 40 / 49 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Non-thinking | Arena-Hard | 6.5 | 40 / 40 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | AIME 2024 | 40.1 | 42 / 69 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMMU PRO | 50.3 | 43 / 52 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | HMMT Feb 2025 | 11.9 | 43 / 44 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Thinking | HMMT Feb 2025 | 10.2 | 44 / 44 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Thinking | BFCL v3 | 46.4 | 45 / 49 | Tracked evidence |
| Qwen 3 30b A3b · Non-thinking | AIME 2024 | 32.8 | 45 / 69 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MMMU PRO | 47.7 | 46 / 52 | Tracked evidence |
| Qwen 3 235b A22b · thinking-2507 | BrowseComp | 4.6 | 46 / 51 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Non-thinking | BFCL v3 | 44.1 | 47 / 49 | Tracked evidence |
| Qwen 3 14b · Non-thinking | AIME 2024 | 31.7 | 47 / 69 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Thinking | MATH 500 | 77.6 | 48 / 55 | Tracked evidence |
| Qwen 3 32b · Non-thinking | AIME 2024 | 31 | 48 / 69 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MMMU PRO | 42.5 | 49 / 52 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | BrowseComp | 2.3 | 49 / 51 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | MATH 500 | 73 | 50 / 55 | Tracked evidence |
| Qwen 3 8b · Non-thinking | AIME 2024 | 29.1 | 50 / 69 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MMMU PRO | 31.4 | 51 / 52 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMMU PRO | 31.2 | 52 / 52 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Non-thinking | MATH 500 | 55.2 | 53 / 55 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | AIME 2024 | 25 | 53 / 69 | Tracked evidence |
| Qwen3-1.7B (Thinking) · Non-thinking | AIME 2024 | 13.4 | 60 / 69 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Thinking | AIME 2024 | 10.7 | 61 / 69 | Tracked evidence |
| Qwen3-0.6B (Thinking) · Non-thinking | AIME 2024 | 3.4 | 68 / 69 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.5 122b A10b · Non-Thinking | LiveCodeBench · v5 | 78.9 | 3 / 5 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | LiveCodeBench · 2024_07_2025_01 | 78.2 | 4 / 8 | In Quality Score |
| Qwen 3 235b A22b · Thinking | LiveCodeBench · 2024_08_2025_05 | 66.5 | 4 / 17 | In Quality Score |
| Qwen3 Max · Qwen3 Max | LiveCodeBench · v6 | 85.9 | 5 / 40 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | SWE-bench Verified · single_agentless | 39.4 | 5 / 7 | In Quality Score |
| Qwen 3.6 27B · Thinking | LiveCodeBench · v6 | 83.9 | 8 / 40 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | LiveCodeBench · v6 | 83.6 | 9 / 40 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | SWE-bench Verified · multilingual_single | 20.9 | 10 / 10 | In Quality Score |
| Qwen 3 235b A22b · Thinking | LiveCodeBench | 70.7 | 11 / 69 | In Quality Score |
| Qwen 3.5 27b · Thinking | LiveCodeBench · v6 | 80.7 | 13 / 40 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | Aider (Polyglot) | 61.8 | 13 / 45 | In Quality Score |
| Qwen 3 235b A22b · Thinking | Aider (Polyglot) | 61.8 | 14 / 45 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | LiveCodeBench · v6 | 80.4 | 15 / 40 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | LiveCodeBench · v6 | 78.9 | 16 / 40 | In Quality Score |
| Qwen 3 32b · Thinking | LiveCodeBench | 65.7 | 16 / 69 | In Quality Score |
| Qwen 3 Coder 480B A35B Instruct · Non-thinking | GSO (Global Software Optimization) · opt_at_1 | 3.9 | 17 / 24 | In Quality Score |
| Qwen 3 235b A22b · Thinking | LiveCodeBench · v6 | 75.1 | 18 / 40 | In Quality Score |
| Qwen 3 14b · Thinking | LiveCodeBench | 63.5 | 18 / 69 | In Quality Score |
| Qwen 3.6 27B · Thinking | SWE-bench Verified | 77.2 | 19 / 68 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | LiveCodeBench · v6 | 74.6 | 20 / 40 | In Quality Score |
| Qwen 3 30b A3b · Thinking | LiveCodeBench | 62.6 | 21 / 69 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | SWE-bench Verified | 76.4 | 22 / 68 | In Quality Score |
| Qwen3 Next 80B A3B · Thinking | LiveCodeBench · v6 | 68.7 | 22 / 40 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | LiveCodeBench · v6 | 66 | 23 / 40 | In Quality Score |
| Qwen 3 8b · Thinking | LiveCodeBench | 57.5 | 23 / 69 | In Quality Score |
| Qwen 3.5 9B · Thinking | LiveCodeBench · v6 | 65.6 | 24 / 40 | In Quality Score |
| Qwen3 Max · Qwen3 Max | SWE-bench Verified | 75.3 | 25 / 68 | In Quality Score |
| Qwen 3.5 27b · Thinking | SWE-bench Verified | 75 | 26 / 68 | In Quality Score |
| Qwen 3 32b · Thinking | Aider (Polyglot) | 50.2 | 27 / 45 | In Quality Score |
| Qwen 3.5 4B · Thinking | LiveCodeBench · v6 | 55.8 | 29 / 40 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | LiveCodeBench | 54.2 | 29 / 69 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | SWE-bench Verified | 73.4 | 32 / 68 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | LiveCodeBench · v6 | 37 | 36 / 40 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | LiveCodeBench | 35.3 | 40 / 69 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | SWE-bench Verified | 72 | 41 / 68 | In Quality Score |
| Qwen3-1.7B (Thinking) · Thinking | LiveCodeBench | 33.2 | 42 / 69 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | SWE-bench Verified | 69.2 | 44 / 68 | In Quality Score |
| Qwen 3 32b · Non-thinking | LiveCodeBench | 31.3 | 44 / 69 | In Quality Score |
| Qwen 3 30b A3b · Non-thinking | LiveCodeBench | 29.8 | 47 / 69 | In Quality Score |
| Qwen 3 14b · Non-thinking | LiveCodeBench | 29 | 50 / 69 | In Quality Score |
| Qwen 3 8b · Non-thinking | LiveCodeBench | 22.8 | 56 / 69 | In Quality Score |
| Qwen3-4B (Thinking) · Non-thinking | LiveCodeBench | 21.3 | 58 / 69 | In Quality Score |
| Qwen3-0.6B (Thinking) · Thinking | LiveCodeBench | 12.3 | 63 / 69 | In Quality Score |
| Qwen3-1.7B (Thinking) · Non-thinking | LiveCodeBench | 11.6 | 64 / 69 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | SWE-bench Verified | 34.4 | 65 / 68 | In Quality Score |
| Qwen 3 235b A22b · Thinking | SWE-bench Verified | 34.4 | 66 / 68 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | SWE-bench Verified | 22 | 68 / 68 | In Quality Score |
| Qwen3-0.6B (Thinking) · Non-thinking | LiveCodeBench | 3.6 | 68 / 69 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | SecCodeBench | 68.3 | 3 / 6 | Tracked evidence |
| Qwen 3.5 27b · Thinking | OJ-Bench | 40.1 | 4 / 19 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | NL2Repo | 37.9 | 4 / 9 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | OJ-Bench | 39.5 | 5 / 19 | Tracked evidence |
| Qwen 3.6 27B · Thinking | NL2Repo | 36.2 | 5 / 9 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | Codeforces | 2146 | 6 / 47 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | SecCodeBench | 57.5 | 6 / 6 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | Codeforces | 2100 | 7 / 47 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | OJ-Bench | 36 | 7 / 19 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | OJ-Bench | 32.7 | 8 / 19 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | OJ-Bench | 29.7 | 9 / 19 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | NL2Repo | 29.4 | 9 / 9 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | Codeforces | 2028 | 10 / 47 | Tracked evidence |
| Qwen 3.6 27B · Thinking | SWE-bench Multilingual | 71.3 | 10 / 18 | Tracked evidence |
| Qwen 3.5 9B · Thinking | OJ-Bench | 29.2 | 10 / 19 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | OJ-Bench | 25.1 | 12 / 19 | Tracked evidence |
| Qwen 3 32b · Thinking | Codeforces | 1977 | 13 / 47 | Tracked evidence |
| Qwen 3.5 4B · Thinking | OJ-Bench | 24.1 | 13 / 19 | Tracked evidence |
| Qwen 3 30b A3b · Thinking | Codeforces | 1974 | 14 / 47 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | SWE-bench Multilingual | 69.3 | 14 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | Codeforces | 1899 | 15 / 47 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | SWE-bench Multilingual | 67.2 | 15 / 18 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | SWE-bench Multilingual | 66.7 | 16 / 18 | Tracked evidence |
| Qwen 3 8b · Thinking | Codeforces | 1785 | 18 / 47 | Tracked evidence |
| Qwen 3 14b · Thinking | Codeforces | 1766 | 19 / 47 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | OJ-Bench | 11.3 | 19 / 19 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | Codeforces | 1671 | 22 / 47 | Tracked evidence |
| Qwen 3 235b A22b · Non-Thinking | Codeforces | 1387 | 24 / 47 | Tracked evidence |
| Qwen 3 32b · Non-thinking | Codeforces | 1353 | 25 / 47 | Tracked evidence |
| Qwen 3 30b A3b · Non-thinking | Codeforces | 1267 | 28 / 47 | Tracked evidence |
| Qwen 3 14b · Non-thinking | Codeforces | 1200 | 29 / 47 | Tracked evidence |
| Qwen 3 8b · Non-thinking | Codeforces | 1110 | 32 / 47 | Tracked evidence |
| Qwen3-4B (Thinking) · Non-thinking | Codeforces | 842 | 40 / 47 | Tracked evidence |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.6 Plus · Thinking | MCP Atlas · public_set | 74.1 | 1 / 13 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | τ²-bench · average | 86.7 | 7 / 30 | In Quality Score |
| Qwen3 Max · Qwen3 Max | τ²-bench · average | 84.6 | 10 / 30 | In Quality Score |
| Qwen 3.5 35b A3b · Thinking | τ²-bench · average | 81.2 | 11 / 30 | In Quality Score |
| Qwen 3.5 4B · Thinking | τ²-bench · average | 79.9 | 14 / 30 | In Quality Score |
| Qwen 3.6 35B-A3B · Thinking | MCP Atlas | 62.8 | 14 / 33 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | τ²-bench · airline | 58 | 15 / 29 | In Quality Score |
| Qwen 3.5 122b A10b · Thinking | τ²-bench · average | 79.5 | 16 / 30 | In Quality Score |
| Qwen 3.5 9B · Thinking | τ²-bench · average | 79.1 | 17 / 30 | In Quality Score |
| Qwen 3.5 27b · Thinking | τ²-bench · average | 79 | 18 / 30 | In Quality Score |
| Qwen 3 235b A22b · thinking-2507 | τ²-bench · retail | 71.9 | 20 / 34 | In Quality Score |
| Qwen 3 235b A22b · Thinking | τ²-bench · average | 58.5 | 23 / 30 | In Quality Score |
| Qwen3 Next 80B A3B · Thinking | τ²-bench · average | 57.4 | 24 / 30 | In Quality Score |
| Qwen 3.5 2B · Thinking | τ²-bench · average | 48.8 | 25 / 30 | In Quality Score |
| Qwen3-4B (Thinking) · Thinking | τ²-bench · average | 43.2 | 26 / 30 | In Quality Score |
| Qwen 3 235b A22b · Thinking | τ²-bench · airline | 34.7 | 27 / 29 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | τ²-bench · telecom | 22.1 | 27 / 28 | In Quality Score |
| Qwen 3 30b A3b · Thinking (2507) | τ²-bench · average | 41.9 | 28 / 30 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | τ²-bench · airline | 26.5 | 29 / 29 | In Quality Score |
| Qwen 3.5 0.8B · Thinking | τ²-bench · average | 11.6 | 30 / 30 | In Quality Score |
| Qwen 3 235b A22b · Thinking | τ²-bench · retail | 58.6 | 32 / 34 | In Quality Score |
| Qwen 3 235b A22b · Non-Thinking | τ²-bench · retail | 57 | 34 / 34 | In Quality Score |
| Qwen 3.5 397b A17b · Thinking | τ³-Bench · retail | 84.4 | 1 / 6 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | τ³-Bench · telecom | 97.8 | 2 / 6 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | τ³-Bench · airline | 81 | 2 / 6 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | BFCL v4 | 72.9 | 2 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | DeepPlanning | 34.3 | 2 / 16 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | τ³-Bench | 70.7 | 3 / 10 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | WideSearch | 74 | 4 / 13 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | BFCL v4 | 72.2 | 4 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MCPMark | 46.1 | 4 / 8 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | DeepPlanning | 28.7 | 4 / 16 | Tracked evidence |
| Qwen 3.5 27b · Thinking | BFCL v4 | 68.5 | 5 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | DeepPlanning | 25.9 | 5 / 16 | Tracked evidence |
| Qwen 3.5 27b · Thinking | WideSearch | 61.1 | 6 / 13 | Tracked evidence |
| Qwen 3.5 27b · Thinking | Seal-0 | 47.2 | 6 / 16 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MCPMark | 37 | 6 / 8 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | DeepPlanning | 24.1 | 6 / 16 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | τ³-Bench · banking | 9.8 | 6 / 6 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | BFCL v4 | 67.7 | 7 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | WideSearch | 60.5 | 7 / 13 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | Seal-0 | 46.9 | 7 / 16 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | MCPMark | 33.5 | 7 / 8 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | BFCL v4 | 67.3 | 8 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | τ³-Bench | 67.2 | 8 / 10 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | WideSearch | 60.1 | 8 / 13 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | Seal-0 | 46.9 | 8 / 16 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | DeepPlanning | 22.8 | 8 / 16 | Tracked evidence |
| Qwen 3.5 9B · Thinking | BFCL v4 | 66.1 | 9 / 18 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | WideSearch | 57.9 | 9 / 13 | Tracked evidence |
| Qwen 3.5 27b · Thinking | DeepPlanning | 22.6 | 9 / 16 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | WideSearch | 57.1 | 10 / 13 | Tracked evidence |
| Qwen 3.5 9B · Thinking | DeepPlanning | 18 | 10 / 16 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | BFCL v4 | 54.8 | 12 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | Seal-0 | 44.1 | 12 / 16 | Tracked evidence |
| Qwen 3.5 4B · Thinking | DeepPlanning | 17.6 | 12 / 16 | Tracked evidence |
| Qwen 3.5 4B · Thinking | BFCL v4 | 50.3 | 13 / 18 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | Seal-0 | 41.4 | 13 / 16 | Tracked evidence |
| Qwen 3 235b A22b · Thinking | DeepPlanning | 17.1 | 13 / 16 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | BFCL v4 | 49.7 | 14 / 18 | Tracked evidence |
| Qwen 3.5 2B · Thinking | BFCL v4 | 43.6 | 15 / 18 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | DeepPlanning | 4.9 | 15 / 16 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | OSWorld · verified | 62.2 | 16 / 27 | Tracked evidence |
| Qwen 3 30b A3b · Thinking (2507) | BFCL v4 | 42.4 | 16 / 18 | Tracked evidence |
| Qwen3 Next 80B A3B · Thinking | DeepPlanning | 0.4 | 16 / 16 | Tracked evidence |
| Qwen3-4B (Thinking) · Thinking | BFCL v4 | 39.9 | 17 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | OSWorld · verified | 58 | 18 / 27 | Tracked evidence |
| Qwen 3.6 Plus · Thinking | Toolathlon | 39.8 | 18 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | BFCL v4 | 25.3 | 18 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | OSWorld · verified | 56.2 | 19 / 27 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | OSWorld · verified | 54.5 | 20 / 27 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | Toolathlon | 38.3 | 20 / 31 | Tracked evidence |
| Qwen 3.5 9B · Thinking | OSWorld · verified | 41.8 | 23 / 27 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | OSWorld · verified | 38.1 | 25 / 27 | Tracked evidence |
| Qwen 3.5 4B · Thinking | OSWorld · verified | 35.6 | 26 / 27 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | OSWorld · verified | 30.6 | 27 / 27 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | Toolathlon | 26.9 | 27 / 31 | Tracked evidence |
| Qwen3 Max · Qwen3 Max | Toolathlon | 18.8 | 29 / 31 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.5 27b · Thinking | CountBench | 97.8 | 1 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | VLMs Are Blind | 97 | 1 / 18 | Tracked evidence |
| Qwen 3.6 27B · Thinking | RefCOCO · avg | 92.5 | 1 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MathVista · mini | 90.3 | 1 / 36 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MathVision | 88.6 | 1 / 17 | Tracked evidence |
| Qwen 3.5 27b · Thinking | DynaMath | 87.7 | 1 / 23 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MLVU · mavg | 87.3 | 1 / 22 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | RealWorldQA | 85.3 | 1 / 24 | Tracked evidence |
| Qwen 3.6 27B · Thinking | EmbSpatialBench | 84.6 | 1 / 24 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMStar | 83.8 | 1 / 33 | Tracked evidence |
| Qwen 3.5 27b · Thinking | LingoQA | 82 | 1 / 16 | Tracked evidence |
| Qwen3 VL 32B · Thinking | MathVerse · mini | 78.2 | 1 / 10 | Tracked evidence |
| Qwen3 VL 32B · Thinking | HallusionBench | 76.6 | 1 / 33 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | RefSpatialBench | 73.6 | 1 / 21 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | MathVision · mini | 60.5 | 1 / 10 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | BabyVision | 52.3 | 1 / 22 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | ODinW · 13 | 50.8 | 1 / 13 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | ZEROBench · sub | 41 | 1 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | CountBench | 97.8 | 2 / 23 | Tracked evidence |
| Qwen 3.6 27B · Thinking | VLMs Are Blind | 97 | 2 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | V* | 95.8 | 2 / 23 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | AI2D · test | 93.9 | 2 / 33 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | RefCOCO · avg | 92.3 | 2 / 18 | Tracked evidence |
| Qwen 3.6 27B · Thinking | VideoMME · with_sub | 87.7 | 2 / 22 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MLVU · mavg | 86.7 | 2 / 22 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | RealWorldQA | 85.1 | 2 / 24 | Tracked evidence |
| Qwen 3.5 27b · Thinking | EmbSpatialBench | 84.5 | 2 / 24 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | ChartQA · test | 84 | 2 / 10 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | SLAKE | 81.6 | 2 / 22 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | LingoQA | 81.6 | 2 / 16 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MVBench | 77.6 | 2 / 18 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | HallusionBench | 74.9 | 2 / 33 | Tracked evidence |
| Qwen3 VL 8B · Thinking | MathVerse · mini | 73.3 | 2 / 10 | Tracked evidence |
| Qwen 3.6 27B · Thinking | RefSpatialBench | 70 | 2 / 21 | Tracked evidence |
| Qwen3 VL 32B · Thinking | MathVision · mini | 58.6 | 2 / 10 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | ODinW · 13 | 47 | 2 / 13 | Tracked evidence |
| Qwen 3.6 27B · Thinking | CountBench | 97.8 | 3 / 23 | Tracked evidence |
| Qwen 3.5 27b · Thinking | VLMs Are Blind | 96.9 | 3 / 18 | Tracked evidence |
| Qwen 3.6 27B · Thinking | V* | 94.7 | 3 / 23 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMBench · en_dev_v1_1 | 93.7 | 3 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | AI2D · test | 93.3 | 3 / 33 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | RefCOCO · avg | 92 | 3 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | VideoMME · with_sub | 87.5 | 3 / 22 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MLVU · mavg | 86.6 | 3 / 22 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | DynaMath | 86.3 | 3 / 23 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | EmbSpatialBench | 84.5 | 3 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | RealWorldQA | 84.1 | 3 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | VideoMME · without_sub | 83.9 | 3 / 21 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | ChartQA · test | 83.2 | 3 / 10 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMStar | 82.9 | 3 / 33 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | LingoQA | 80.8 | 3 / 16 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | VideoMME | 79 | 3 / 4 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MVBench | 76.6 | 3 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | LVBench | 75.5 | 3 / 18 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | HallusionBench | 74.1 | 3 / 33 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | RefSpatialBench | 69.9 | 3 / 21 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | ERQA | 67.5 | 3 / 27 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | MathVerse · mini | 64.2 | 3 / 10 | Tracked evidence |
| Qwen3 VL 8B · Thinking | MathVision · mini | 50.7 | 3 / 10 | Tracked evidence |
| Qwen 3.5 27b · Thinking | BabyVision | 44.6 | 3 / 22 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | ZEROBench · sub | 36.2 | 3 / 23 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | ZEROBench | 12 | 3 / 27 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | VLMs Are Blind | 96.7 | 4 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | V* | 93.7 | 4 / 23 | Tracked evidence |
| Qwen 3.5 27b · Thinking | AI2D · test | 92.9 | 4 / 33 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMBench · en_dev_v1_1 | 92.8 | 4 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | RefCOCO · avg | 91.3 | 4 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MathVista · mini | 87.8 | 4 / 36 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MathVision | 86.2 | 4 / 17 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MLVU · mavg | 86.2 | 4 / 22 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | DynaMath | 85.9 | 4 / 23 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | EmbSpatialBench | 84.3 | 4 / 24 | Tracked evidence |
| Qwen 3.6 27B · Thinking | RealWorldQA | 84.1 | 4 / 24 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | VideoMME · without_sub | 83.7 | 4 / 21 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMStar | 81.9 | 4 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | LingoQA | 80.4 | 4 / 16 | Tracked evidence |
| Qwen 3.5 27b · Thinking | SLAKE | 80 | 4 / 22 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MVBench | 75.5 | 4 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | LVBench | 74.4 | 4 / 18 | Tracked evidence |
| Qwen3 VL 8B · Thinking | HallusionBench | 73 | 4 / 33 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | ScreenSpot-Pro | 70.4 | 4 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | RefSpatialBench | 69.3 | 4 / 21 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | ODinW · 13 | 44.5 | 4 / 13 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | BabyVision | 40.2 | 4 / 22 | Tracked evidence |
| Qwen 3.5 27b · Thinking | ZEROBench · sub | 36.2 | 4 / 23 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | CountBench | 97.2 | 5 / 23 | Tracked evidence |
| Qwen 3.5 9B · Thinking | VLMs Are Blind | 93.7 | 5 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | V* | 93.2 | 5 / 23 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MMBench · en_dev_v1_1 | 92.8 | 5 / 24 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | AI2D · test | 92.7 | 5 / 33 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | RefCOCO · avg | 91.1 | 5 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MathVista · mini | 87.4 | 5 / 36 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | VideoMME · with_sub | 87.3 | 5 / 22 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MathVision | 86 | 5 / 17 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MLVU · mavg | 85.9 | 5 / 22 | Tracked evidence |
| Qwen 3.6 27B · Thinking | DynaMath | 85.6 | 5 / 23 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | EmbSpatialBench | 84.3 | 5 / 24 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | RealWorldQA | 83.9 | 5 / 24 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MMStar | 81.4 | 5 / 33 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | SLAKE | 79.9 | 5 / 22 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | LingoQA | 79.2 | 5 / 16 | Tracked evidence |
| Qwen3 VL 32B · Thinking | ChartQA · test | 79.1 | 5 / 10 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMVU | 75.4 | 5 / 20 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MVBench | 75.2 | 5 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | HallusionBench | 71.4 | 5 / 33 | Tracked evidence |
| Qwen 3.5 27b · Thinking | ScreenSpot-Pro | 70.3 | 5 / 24 | Tracked evidence |
| Qwen 3.5 27b · Thinking | RefSpatialBench | 67.7 | 5 / 21 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | SimpleVQA | 67.1 | 5 / 29 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | ERQA | 64.8 | 5 / 27 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | MathVerse · mini | 57.4 | 5 / 10 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | MathVision · mini | 50 | 5 / 10 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | ODinW · 13 | 43.2 | 5 / 13 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | ZEROBench · sub | 34.4 | 5 / 23 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | WorldVQA | 23.5 | 5 / 5 | Tracked evidence |
| Qwen 3.5 9B · Thinking | CountBench | 97.2 | 6 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | V* | 92.7 | 6 / 23 | Tracked evidence |
| Qwen 3.5 4B · Thinking | VLMs Are Blind | 92.6 | 6 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMBench · en_dev_v1_1 | 92.6 | 6 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | AI2D · test | 92.6 | 6 / 33 | Tracked evidence |
| Qwen 3.5 27b · Thinking | RefCOCO · avg | 90.9 | 6 / 18 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MathVista · mini | 87.4 | 6 / 36 | Tracked evidence |
| Qwen 3.5 27b · Thinking | VideoMME · with_sub | 87 | 6 / 22 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | Video-MMMU | 84.7 | 6 / 28 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | EmbSpatialBench | 83.9 | 6 / 24 | Tracked evidence |
| Qwen 3.5 27b · Thinking | RealWorldQA | 83.7 | 6 / 24 | Tracked evidence |
| Qwen 3.5 27b · Thinking | VideoMME · without_sub | 82.8 | 6 / 21 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMStar | 81 | 6 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | SLAKE | 79 | 6 / 22 | Tracked evidence |
| Qwen3 VL 8B · Thinking | ChartQA · test | 78.6 | 6 / 10 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MVBench | 74.8 | 6 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMVU | 74.7 | 6 / 20 | Tracked evidence |
| Qwen 3.5 27b · Thinking | LVBench | 73.6 | 6 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MedXpertQA · mm | 70 | 6 / 31 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | ODinW · 13 | 42.6 | 6 / 13 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | BabyVision | 38.4 | 6 / 22 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | ZEROBench · sub | 34.1 | 6 / 23 | Tracked evidence |
| Qwen 3.5 27b · Thinking | ZEROBench | 10 | 6 / 27 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | CountBench | 97 | 7 / 23 | Tracked evidence |
| Qwen 3.6 27B · Thinking | MMBench · en_dev_v1_1 | 92.3 | 7 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Non-Thinking | V* | 90.1 | 7 / 23 | Tracked evidence |
| Qwen 3.5 9B · Thinking | RefCOCO · avg | 89.7 | 7 / 18 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | VideoMME · with_sub | 86.6 | 7 / 22 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MathVista · mini | 86.4 | 7 / 36 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MLVU · mavg | 85.6 | 7 / 22 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | DynaMath | 85 | 7 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MathVision | 83.9 | 7 / 17 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | EmbSpatialBench | 83.1 | 7 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | VideoMME · without_sub | 82.5 | 7 / 21 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | SLAKE | 78.7 | 7 / 22 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MVBench | 74.6 | 7 / 18 | Tracked evidence |
| Qwen 3.5 4B · Thinking | LingoQA | 74.4 | 7 / 16 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMVU | 73.3 | 7 / 20 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | LVBench | 71.4 | 7 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | HallusionBench | 70 | 7 / 33 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | ScreenSpot-Pro | 68.6 | 7 / 24 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MedXpertQA · mm | 67.3 | 7 / 31 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | RefSpatialBench | 64.3 | 7 / 21 | Tracked evidence |
| Qwen 3.6 27B · Thinking | ERQA | 62.5 | 7 / 27 | Tracked evidence |
| Qwen 3.5 27b · Thinking | ODinW · 13 | 41.1 | 7 / 13 | Tracked evidence |
| Qwen 3.5 4B · Thinking | CountBench | 96.3 | 8 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMBench · en_dev_v1_1 | 91.5 | 8 / 24 | Tracked evidence |
| Qwen 3.5 9B · Thinking | V* | 90.1 | 8 / 23 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | RefCOCO · avg | 89.3 | 8 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | VideoMME · with_sub | 86.6 | 8 / 22 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MathVista · mini | 86.2 | 8 / 36 | Tracked evidence |
| Qwen 3.6 27B · Thinking | Video-MMMU | 84.4 | 8 / 28 | Tracked evidence |
| Qwen 3.5 9B · Thinking | EmbSpatialBench | 83 | 8 / 24 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | VideoMME · without_sub | 82.5 | 8 / 21 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMStar | 79.7 | 8 / 33 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | VLMs Are Blind | 79.5 | 8 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | MVBench | 74.6 | 8 / 18 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMVU | 72.3 | 8 / 20 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | LVBench | 71.4 | 8 / 18 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | ScreenSpot-Pro | 65.6 | 8 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | RefSpatialBench | 63.5 | 8 / 21 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | ERQA | 62 | 8 / 27 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | SimpleVQA | 61.7 | 8 / 29 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | ODinW · 13 | 40.5 | 8 / 13 | Tracked evidence |
| Qwen 3.5 27b · Non-Thinking | BabyVision | 34.8 | 8 / 22 | Tracked evidence |
| Qwen 3.5 9B · Thinking | AI2D · test | 90.2 | 9 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMBench · en_dev_v1_1 | 90.1 | 9 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Non-Thinking | V* | 89.5 | 9 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | RefCOCO · avg | 89.2 | 9 / 18 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MathVista · mini | 85.8 | 9 / 36 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MLVU · mavg | 84.4 | 9 / 22 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | Video-MMMU | 83.7 | 9 / 28 | Tracked evidence |
| Qwen 3.5 9B · Thinking | DynaMath | 83.6 | 9 / 23 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | RealWorldQA | 81.3 | 9 / 24 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MathVision | 78.9 | 9 / 17 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MMStar | 78.7 | 9 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MVBench | 74.4 | 9 / 18 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MMVU | 71.1 | 9 / 20 | Tracked evidence |
| Qwen 3.5 9B · Thinking | LVBench | 70 | 9 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | HallusionBench | 69.8 | 9 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | ScreenSpot-Pro | 65.2 | 9 / 24 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | SimpleVQA | 61.3 | 9 / 29 | Tracked evidence |
| Qwen 3.5 27b · Thinking | ERQA | 60.5 | 9 / 27 | Tracked evidence |
| Qwen 3.5 9B · Thinking | RefSpatialBench | 58.5 | 9 / 21 | Tracked evidence |
| Qwen3 VL 4B · Thinking | ODinW · 13 | 39.4 | 9 / 13 | Tracked evidence |
| Qwen 3.5 122b A10b · Non-Thinking | BabyVision | 34.5 | 9 / 22 | Tracked evidence |
| Qwen 3.5 9B · Thinking | ZEROBench · sub | 31.1 | 9 / 23 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | ZEROBench | 9 | 9 / 27 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | CountBench | 93.7 | 10 / 23 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MMBench · en_dev_v1_1 | 89.7 | 10 / 24 | Tracked evidence |
| Qwen 3.5 4B · Thinking | AI2D · test | 89.6 | 10 / 33 | Tracked evidence |
| Qwen 3.5 27b · Non-Thinking | V* | 89 | 10 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | RefCOCO · avg | 88.2 | 10 / 18 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MathVista · mini | 85.7 | 10 / 36 | Tracked evidence |
| Qwen 3.5 9B · Thinking | VideoMME · with_sub | 84.5 | 10 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MLVU · mavg | 83.8 | 10 / 22 | Tracked evidence |
| Qwen 3.5 4B · Thinking | DynaMath | 83.3 | 10 / 23 | Tracked evidence |
| Qwen 3.5 4B · Thinking | EmbSpatialBench | 81.3 | 10 / 24 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | CharXiv Reasoning | 80.8 | 10 / 48 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | VideoMME · without_sub | 79 | 10 / 21 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMStar | 78.3 | 10 / 33 | Tracked evidence |
| Qwen 3.5 4B · Thinking | SLAKE | 76.1 | 10 / 22 | Tracked evidence |
| Qwen 3.5 2B · Thinking | VLMs Are Blind | 75.8 | 10 / 18 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MathVision | 74.6 | 10 / 17 | Tracked evidence |
| Qwen 3.5 9B · Thinking | HallusionBench | 69.3 | 10 / 33 | Tracked evidence |
| Qwen 3.5 4B · Thinking | LVBench | 66.4 | 10 / 18 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | ScreenSpot-Pro | 62 | 10 / 24 | Tracked evidence |
| Qwen 3.5 4B · Thinking | RefSpatialBench | 54.6 | 10 / 21 | Tracked evidence |
| Qwen3 VL 2B · Thinking | ODinW · 13 | 36 | 10 / 13 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMBench · en_dev_v1_1 | 89.4 | 11 / 24 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | AI2D · test | 89.2 | 11 / 33 | Tracked evidence |
| Qwen 3.5 9B · Non-Thinking | V* | 88.5 | 11 / 23 | Tracked evidence |
| Qwen 3.5 4B · Thinking | RefCOCO · avg | 88.1 | 11 / 18 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MathVista · mini | 85.1 | 11 / 36 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | VideoMME · with_sub | 83.8 | 11 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | DynaMath | 82.8 | 11 / 23 | Tracked evidence |
| Qwen 3.5 9B · Thinking | RealWorldQA | 80.3 | 11 / 24 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MathVision | 74.6 | 11 / 17 | Tracked evidence |
| Qwen 3.5 2B · Thinking | SLAKE | 74.4 | 11 / 22 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | VLMs Are Blind | 74.3 | 11 / 18 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | LingoQA | 66.8 | 11 / 16 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | LVBench | 63.6 | 11 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | SimpleVQA | 58.9 | 11 / 29 | Tracked evidence |
| Qwen 3.5 9B · Thinking | ERQA | 55.5 | 11 / 27 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | RefSpatialBench | 54.2 | 11 / 21 | Tracked evidence |
| Qwen 3.5 2B · Thinking | ODinW · 13 | 35.9 | 11 / 13 | Tracked evidence |
| Qwen 3.5 35b A3b · Non-Thinking | BabyVision | 29.6 | 11 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | ZEROBench · sub | 28.4 | 11 / 23 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | ZEROBench | 8 | 11 / 27 | Tracked evidence |
| Qwen 3.5 2B · Thinking | CountBench | 91.4 | 12 / 23 | Tracked evidence |
| Qwen3 VL 32B · Thinking | MathVista · mini | 83.8 | 12 / 36 | Tracked evidence |
| Qwen 3.5 27b · Thinking | Video-MMMU | 82.3 | 12 / 28 | Tracked evidence |
| Qwen3 VL 4B · Thinking | EmbSpatialBench | 80.7 | 12 / 24 | Tracked evidence |
| Qwen 3.5 4B · Thinking | RealWorldQA | 79.5 | 12 / 24 | Tracked evidence |
| Qwen 3.5 9B · Thinking | VideoMME · without_sub | 78.4 | 12 / 21 | Tracked evidence |
| Qwen3 VL 32B · Thinking | MMStar | 75.7 | 12 / 33 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | VLMs Are Blind | 72.5 | 12 / 18 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MVBench | 72 | 12 / 18 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | HallusionBench | 67.9 | 12 / 33 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMVU | 67.8 | 12 / 20 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MedXpertQA · mm | 62.4 | 12 / 31 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | ScreenSpot-Pro | 60.5 | 12 / 24 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | SimpleVQA | 58.3 | 12 / 29 | Tracked evidence |
| Qwen3 VL 4B · Thinking | RefSpatialBench | 45.3 | 12 / 21 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | ODinW · 13 | 33.2 | 12 / 13 | Tracked evidence |
| Qwen 3.5 9B · Thinking | BabyVision | 28.6 | 12 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MMBench · en_dev_v1_1 | 88.9 | 13 / 24 | Tracked evidence |
| Qwen 3.5 4B · Non-Thinking | V* | 86.4 | 13 / 23 | Tracked evidence |
| Qwen 3.5 2B · Thinking | RefCOCO · avg | 84.8 | 13 / 18 | Tracked evidence |
| Qwen 3.5 4B · Thinking | VideoMME · with_sub | 83.5 | 13 / 22 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MLVU · mavg | 82.8 | 13 / 22 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | Video-MMMU | 82 | 13 / 28 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | EmbSpatialBench | 80.6 | 13 / 24 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | DynaMath | 80.1 | 13 / 23 | Tracked evidence |
| Qwen 3.5 27b · Thinking | CharXiv Reasoning | 79.5 | 13 / 48 | Tracked evidence |
| Qwen 3.5 4B · Thinking | VideoMME · without_sub | 76.9 | 13 / 21 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MMStar | 75.5 | 13 / 33 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MVBench | 71.2 | 13 / 18 | Tracked evidence |
| Qwen3 VL 4B · Thinking | VLMs Are Blind | 68.6 | 13 / 18 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | HallusionBench | 67.6 | 13 / 33 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MMVU | 66.1 | 13 / 20 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | LingoQA | 62 | 13 / 16 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MedXpertQA · mm | 61.4 | 13 / 31 | Tracked evidence |
| Qwen 3.5 4B · Thinking | ScreenSpot-Pro | 60.3 | 13 / 24 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | LVBench | 59.2 | 13 / 18 | Tracked evidence |
| Qwen 3.5 2B · Thinking | RefSpatialBench | 32.9 | 13 / 21 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | ODinW · 13 | 31.6 | 13 / 13 | Tracked evidence |
| Qwen 3.5 9B · Non-Thinking | BabyVision | 25.8 | 13 / 22 | Tracked evidence |
| Qwen3 VL 32B · Thinking | AI2D · test | 87.2 | 14 / 33 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | V* | 85.9 | 14 / 23 | Tracked evidence |
| Qwen3 VL 2B · Thinking | RefCOCO · avg | 84.8 | 14 / 18 | Tracked evidence |
| Qwen 3.5 2B · Thinking | EmbSpatialBench | 77.9 | 14 / 24 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | RealWorldQA | 77.4 | 14 / 24 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | MMStar | 74.3 | 14 / 33 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MVBench | 69.3 | 14 / 18 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | SLAKE | 68.8 | 14 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | HallusionBench | 66.7 | 14 / 33 | Tracked evidence |
| Qwen3 VL 4B · Thinking | ScreenSpot-Pro | 59.5 | 14 / 24 | Tracked evidence |
| Qwen 3.5 4B · Thinking | ERQA | 54 | 14 / 27 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | RefSpatialBench | 30 | 14 / 21 | Tracked evidence |
| Qwen 3.5 4B · Thinking | ZEROBench · sub | 26.3 | 14 / 23 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | BabyVision | 22.2 | 14 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | ZEROBench | 4 | 14 / 27 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | RefCOCO · avg | 84.3 | 15 / 18 | Tracked evidence |
| Qwen 3.5 4B · Thinking | V* | 84.3 | 15 / 23 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MathVista · mini | 81.9 | 15 / 36 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | Video-MMMU | 80.4 | 15 / 28 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | VideoMME · with_sub | 79.9 | 15 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MLVU · mavg | 78.9 | 15 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | VideoMME · without_sub | 73.3 | 15 / 21 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | SLAKE | 67.5 | 15 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | HallusionBench | 66 | 15 / 33 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MathVision | 65.7 | 15 / 17 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMVU | 64.9 | 15 / 20 | Tracked evidence |
| Qwen 3.5 2B · Thinking | LVBench | 57.1 | 15 / 18 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | ERQA | 52.5 | 15 / 27 | Tracked evidence |
| Qwen3 VL 2B · Thinking | RefSpatialBench | 28.9 | 15 / 21 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | ZEROBench · sub | 23.7 | 15 / 23 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | CountBench | 90 | 16 / 23 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | AI2D · test | 86.9 | 16 / 33 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | V* | 83.2 | 16 / 23 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | MathVista · mini | 81.8 | 16 / 36 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | Video-MMMU | 80 | 16 / 28 | Tracked evidence |
| Qwen 3.6 27B · Thinking | CharXiv Reasoning | 78.4 | 16 / 48 | Tracked evidence |
| Qwen3 VL 2B · Thinking | EmbSpatialBench | 75.9 | 16 / 24 | Tracked evidence |
| Qwen 3.5 2B · Thinking | RealWorldQA | 74.5 | 16 / 24 | Tracked evidence |
| Qwen3 VL 4B · Thinking | SLAKE | 65.9 | 16 / 22 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MVBench | 64.9 | 16 / 18 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | VLMs Are Blind | 59.4 | 16 / 18 | Tracked evidence |
| Qwen 3.6 27B · Thinking | SimpleVQA | 56.1 | 16 / 29 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | ScreenSpot-Pro | 54.5 | 16 / 24 | Tracked evidence |
| Qwen3 VL 4B · Thinking | LVBench | 53.5 | 16 / 18 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | RefSpatialBench | 23.5 | 16 / 21 | Tracked evidence |
| Qwen 3.5 4B · Non-Thinking | BabyVision | 19.1 | 16 / 22 | Tracked evidence |
| Qwen3 VL 4B · Thinking | CountBench | 89.4 | 17 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MMBench · en_dev_v1_1 | 86.7 | 17 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | RefCOCO · avg | 79.3 | 17 / 18 | Tracked evidence |
| Qwen 3.6 35B-A3B · Thinking | CharXiv Reasoning | 78 | 17 / 48 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MLVU · mavg | 76.2 | 17 / 22 | Tracked evidence |
| Qwen3 VL 4B · Thinking | VideoMME · with_sub | 76 | 17 / 22 | Tracked evidence |
| Qwen3 VL 4B · Thinking | DynaMath | 74.4 | 17 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | RealWorldQA | 73.2 | 17 / 24 | Tracked evidence |
| Qwen 3.5 2B · Thinking | VideoMME · without_sub | 69 | 17 / 21 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MVBench | 64.5 | 17 / 18 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MMVU | 58.6 | 17 / 20 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | VLMs Are Blind | 57.3 | 17 / 18 | Tracked evidence |
| Qwen 3.5 27b · Thinking | SimpleVQA | 56 | 17 / 29 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MedXpertQA · mm | 49.9 | 17 / 31 | Tracked evidence |
| Qwen3 VL 2B · Thinking | ScreenSpot-Pro | 48.5 | 17 / 24 | Tracked evidence |
| Qwen3 VL 2B · Thinking | LVBench | 47.6 | 17 / 18 | Tracked evidence |
| Qwen3 VL 4B · Thinking | ERQA | 47.3 | 17 / 27 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | RefSpatialBench | 21.7 | 17 / 21 | Tracked evidence |
| Qwen 3.5 4B · Thinking | ZEROBench | 3 | 17 / 27 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | CountBench | 86.8 | 18 / 23 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | AI2D · test | 85 | 18 / 33 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMBench · en_dev_v1_1 | 83.3 | 18 / 24 | Tracked evidence |
| Qwen 3.5 9B · Thinking | Video-MMMU | 78.9 | 18 / 28 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | RefCOCO · avg | 77.8 | 18 / 18 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MLVU · mavg | 75.7 | 18 / 22 | Tracked evidence |
| Qwen 3.5 2B · Thinking | VideoMME · with_sub | 75.6 | 18 / 22 | Tracked evidence |
| Qwen 3.5 2B · Thinking | DynaMath | 73.6 | 18 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MMStar | 73.2 | 18 / 33 | Tracked evidence |
| Qwen3 VL 4B · Thinking | VideoMME · without_sub | 68.9 | 18 / 21 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | SLAKE | 62.6 | 18 / 22 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MVBench | 55.8 | 18 / 18 | Tracked evidence |
| Qwen3 VL 2B · Thinking | VLMs Are Blind | 50 | 18 / 18 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MMVU | 48.9 | 18 / 20 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MedXpertQA · mm | 47.6 | 18 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | ScreenSpot-Pro | 46.5 | 18 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | LVBench | 45.1 | 18 / 18 | Tracked evidence |
| Qwen3 VL 4B · Thinking | ZEROBench · sub | 18.9 | 18 / 23 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | BabyVision | 18.6 | 18 / 22 | Tracked evidence |
| Qwen 3.5 9B · Thinking | ZEROBench | 3 | 18 / 27 | Tracked evidence |
| Qwen3 VL 4B · Thinking | AI2D · test | 84.9 | 19 / 33 | Tracked evidence |
| Qwen3 VL 2B · Thinking | CountBench | 84.1 | 19 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MathVista · mini | 79.5 | 19 / 36 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | CharXiv Reasoning | 77.5 | 19 / 48 | Tracked evidence |
| Qwen3 VL 8B · Thinking | MMStar | 72.3 | 19 / 33 | Tracked evidence |
| Qwen 3.5 4B · Thinking | HallusionBench | 65 | 19 / 33 | Tracked evidence |
| Qwen3 VL 2B · Thinking | SLAKE | 61.1 | 19 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | SimpleVQA | 54.3 | 19 / 29 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMVU | 48.6 | 19 / 20 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MedXpertQA · mm | 42.9 | 19 / 31 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | ZEROBench · sub | 18.6 | 19 / 23 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MMBench · en_dev_v1_1 | 81.9 | 20 / 24 | Tracked evidence |
| Qwen3 VL 8B · Thinking | MathVista · mini | 79.5 | 20 / 36 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | CharXiv Reasoning | 77.2 | 20 / 48 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | Video-MMMU | 75 | 20 / 28 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMStar | 71.7 | 20 / 33 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | RealWorldQA | 71.2 | 20 / 24 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | DynaMath | 69.6 | 20 / 23 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | EmbSpatialBench | 68.6 | 20 / 24 | Tracked evidence |
| Qwen3 VL 2B · Thinking | VideoMME · without_sub | 62.1 | 20 / 21 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | SLAKE | 59.5 | 20 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | ERQA | 45.3 | 20 / 27 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MedXpertQA · mm | 35.5 | 20 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMVU | 34.3 | 20 / 20 | Tracked evidence |
| Qwen 3.5 2B · Thinking | ZEROBench · sub | 17.1 | 20 / 23 | Tracked evidence |
| Qwen 3.5 4B · Thinking | BabyVision | 16 | 20 / 22 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MMBench · en_dev_v1_1 | 81.3 | 21 / 24 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | MMStar | 69.9 | 21 / 33 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MLVU · mavg | 69.2 | 21 / 22 | Tracked evidence |
| Qwen3 VL 2B · Thinking | VideoMME · with_sub | 67.9 | 21 / 22 | Tracked evidence |
| Qwen3 VL 2B · Thinking | DynaMath | 66.7 | 21 / 23 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | EmbSpatialBench | 66.4 | 21 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | VideoMME · without_sub | 57.7 | 21 / 21 | Tracked evidence |
| Qwen 3.5 9B · Thinking | SimpleVQA | 51.2 | 21 / 29 | Tracked evidence |
| Qwen3 VL 2B · Thinking | ZEROBench · sub | 13.2 | 21 / 23 | Tracked evidence |
| Qwen 3.5 2B · Thinking | ZEROBench | 1 | 21 / 27 | Tracked evidence |
| Qwen3 VL 8B · Thinking | AI2D · test | 83.9 | 22 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | CountBench | 77 | 22 / 23 | Tracked evidence |
| Qwen 3.5 4B · Thinking | Video-MMMU | 74.1 | 22 / 28 | Tracked evidence |
| Qwen3 VL 2B · Thinking | RealWorldQA | 69.5 | 22 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MLVU · mavg | 65.6 | 22 / 22 | Tracked evidence |
| Qwen3 VL 4B · Thinking | HallusionBench | 64.1 | 22 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | VideoMME · with_sub | 63.8 | 22 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | SLAKE | 54.7 | 22 / 22 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | DynaMath | 49.9 | 22 / 23 | Tracked evidence |
| Qwen3 VL 4B · Thinking | SimpleVQA | 48.8 | 22 / 29 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | ZEROBench · sub | 12.9 | 22 / 23 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | ZEROBench | 0 | 22 / 27 | Tracked evidence |
| Qwen 3.5 2B · Thinking | AI2D · test | 83.3 | 23 / 33 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MathVista · mini | 76.7 | 23 / 36 | Tracked evidence |
| Qwen 3.5 9B · Thinking | CharXiv Reasoning | 73 | 23 / 48 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMBench · en_dev_v1_1 | 69.9 | 23 / 24 | Tracked evidence |
| Qwen3 VL 4B · Thinking | Video-MMMU | 69.4 | 23 / 28 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | CountBench | 68.6 | 23 / 23 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | RealWorldQA | 63.4 | 23 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | DynaMath | 46.5 | 23 / 23 | Tracked evidence |
| Qwen 3.5 2B · Thinking | ERQA | 43.8 | 23 / 27 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | ZEROBench · sub | 11.4 | 23 / 23 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | ZEROBench | 0 | 23 / 27 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | AI2D · test | 83 | 24 / 33 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | MathVista · mini | 76.4 | 24 / 36 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MMBench · en_dev_v1_1 | 68 | 24 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | RealWorldQA | 61.6 | 24 / 24 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | EmbSpatialBench | 54.6 | 24 / 24 | Tracked evidence |
| Qwen3 VL 2B · Thinking | SimpleVQA | 43.6 | 24 / 29 | Tracked evidence |
| Qwen3 VL 2B · Thinking | ERQA | 41.8 | 24 / 27 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MedXpertQA · mm | 26.9 | 24 / 31 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | ZEROBench | 0 | 24 / 27 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MMStar | 68.1 | 25 / 33 | Tracked evidence |
| Qwen 3.5 2B · Thinking | Video-MMMU | 62.1 | 25 / 28 | Tracked evidence |
| Qwen 3.5 4B · Thinking | SimpleVQA | 43.4 | 25 / 29 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | ERQA | 34.5 | 25 / 27 | Tracked evidence |
| Qwen3 VL 2B · Thinking | ZEROBench | 0 | 25 / 27 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | AI2D · test | 81.5 | 26 / 33 | Tracked evidence |
| Qwen 3.5 4B · Thinking | CharXiv Reasoning | 70.8 | 26 / 48 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MMStar | 68 | 26 / 33 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | SimpleVQA | 39.5 | 26 / 29 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | ERQA | 33 | 26 / 27 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MedXpertQA · mm | 26.3 | 26 / 31 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | ZEROBench | 0 | 26 / 27 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MathVista · mini | 73.9 | 27 / 36 | Tracked evidence |
| Qwen3 VL 2B · Thinking | Video-MMMU | 54.1 | 27 / 28 | Tracked evidence |
| Qwen 3.5 2B · Thinking | SimpleVQA | 38.5 | 27 / 29 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MedXpertQA · mm | 25.3 | 27 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | ERQA | 23.8 | 27 / 27 | Tracked evidence |
| Qwen3 VL 4B · Thinking | ZEROBench | 0 | 27 / 27 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MathVista · mini | 73.6 | 28 / 36 | Tracked evidence |
| Qwen 3.5 2B · Thinking | HallusionBench | 58 | 28 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | Video-MMMU | 44.3 | 28 / 28 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | SimpleVQA | 31.3 | 28 / 29 | Tracked evidence |
| Qwen3 VL 2B · Thinking | AI2D · test | 80.4 | 29 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | SimpleVQA | 30.4 | 29 / 29 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MedXpertQA · mm | 19.1 | 29 / 31 | Tracked evidence |
| Qwen3 VL 2B · Thinking | HallusionBench | 54.9 | 30 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MedXpertQA · mm | 17.1 | 30 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | AI2D · test | 69.9 | 31 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMStar | 58.3 | 31 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | HallusionBench | 53.1 | 31 / 33 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MedXpertQA · mm | 13 | 31 / 31 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | AI2D · test | 68.7 | 32 / 33 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | CharXiv Reasoning | 66.1 | 32 / 48 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MMStar | 55.9 | 32 / 33 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | HallusionBench | 51.3 | 32 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MathVista · mini | 62.2 | 33 / 36 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | HallusionBench | 46.7 | 33 / 33 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MathVista · mini | 58.6 | 34 / 36 | Tracked evidence |
| Qwen 3.5 2B · Thinking | CharXiv Reasoning | 58.8 | 37 / 48 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | CharXiv Reasoning | 56.6 | 38 / 48 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | CharXiv Reasoning | 52.6 | 41 / 48 | Tracked evidence |
| Qwen3 VL 4B · Thinking | CharXiv Reasoning | 50.3 | 42 / 48 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | CharXiv Reasoning | 41.3 | 45 / 48 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | CharXiv Reasoning | 38.2 | 46 / 48 | Tracked evidence |
| Qwen3 VL 2B · Thinking | CharXiv Reasoning | 37.1 | 47 / 48 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Qwen 3.5 397b A17b · Thinking | OCRBench | 93.1 | 1 / 35 | Tracked evidence |
| Qwen 3.5 397b A17b · Thinking | MMLongBench-Doc | 61.5 | 2 / 22 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | OCRBench | 92.1 | 3 / 35 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | OCRBench | 91 | 4 / 35 | Tracked evidence |
| Qwen 3.5 27b · Thinking | MMLongBench-Doc | 60.2 | 4 / 22 | Tracked evidence |
| Qwen 3.5 35b A3b · Thinking | MMLongBench-Doc | 59.5 | 5 / 22 | Tracked evidence |
| Qwen3 VL 8B · Non-Thinking | OCRBench | 90 | 6 / 35 | Tracked evidence |
| Qwen 3.5 122b A10b · Thinking | MMLongBench-Doc | 59 | 6 / 22 | Tracked evidence |
| Qwen 3.5 27b · Thinking | OCRBench | 89.4 | 7 / 35 | Tracked evidence |
| Qwen 3.6 27B · Thinking | OCRBench | 89.4 | 8 / 35 | Tracked evidence |
| Qwen 3.5 9B · Thinking | MMLongBench-Doc | 57.7 | 8 / 22 | Tracked evidence |
| Qwen 3.5 9B · Thinking | OCRBench | 89.2 | 9 / 35 | Tracked evidence |
| Qwen3 VL 32B · Non-Thinking | OCRBench | 88.5 | 10 / 35 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | MMLongBench-Doc | 56.2 | 10 / 22 | Tracked evidence |
| Qwen3 VL 235B A22B · Thinking | OCRBench | 87.5 | 11 / 35 | Tracked evidence |
| Qwen 3.5 4B · Thinking | MMLongBench-Doc | 54.2 | 11 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | MMLongBench-Doc | 47.4 | 13 / 22 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | OCRBench | 85.4 | 14 / 35 | Tracked evidence |
| Qwen 3.5 4B · Thinking | OCRBench | 85 | 15 / 35 | Tracked evidence |
| Qwen 3.5 2B · Thinking | MMLongBench-Doc | 45.4 | 15 / 22 | Tracked evidence |
| Qwen3 VL 32B · Thinking | OCRBench | 85 | 16 / 35 | Tracked evidence |
| Qwen3 VL 4B · Thinking | MMLongBench-Doc | 44.4 | 16 / 22 | Tracked evidence |
| Qwen 3.5 2B · Thinking | OCRBench | 84.5 | 17 / 35 | Tracked evidence |
| Qwen 3.5 2B · Non-Thinking | MMLongBench-Doc | 38.8 | 17 / 22 | Tracked evidence |
| Qwen3 VL 30B A3B · Thinking | OCRBench | 83.9 | 18 / 35 | Tracked evidence |
| Qwen3 VL 2B · Thinking | MMLongBench-Doc | 33.8 | 19 / 22 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | MMLongBench-Doc | 33.6 | 20 / 22 | Tracked evidence |
| Qwen3 VL 8B · Thinking | OCRBench | 82 | 21 / 35 | Tracked evidence |
| Qwen3 VL 4B · Thinking | OCRBench | 80.8 | 22 / 35 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | MMLongBench-Doc | 28.1 | 22 / 22 | Tracked evidence |
| Qwen3 VL 2B · Thinking | OCRBench | 79.2 | 25 / 35 | Tracked evidence |
| Qwen 3.5 0.8B · Non-Thinking | OCRBench | 79.1 | 26 / 35 | Tracked evidence |
| Qwen 3.5 0.8B · Thinking | OCRBench | 74.5 | 31 / 35 | Tracked evidence |
Where this family sits in the market
Several Qwen3 variants sit on the open-weights Pareto frontier, with competitive quality at sub-$0.50/M input pricing.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Self-hosting
These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.
- Qwen3-0.6B (Thinking)Non-thinking · open weights
- Qwen3-1.7B (Thinking)Non-thinking · open weights
- Qwen3-4B (Thinking)Non-thinking · open weights
- Qwen 3 8bNon-thinking · open weights
- Qwen 3 14bNon-thinking · open weights
- Qwen 3 30b A3bNon-thinking · open weights
- Qwen 3 32bNon-thinking · open weights
- Qwen 3 235b A22bNon-Thinking · open weights
- Qwen 3.5 0.8BThinking · open weights
- Qwen 3.5 2BThinking · open weights
- Qwen 3.5 4BThinking · open weights
- Qwen 3.5 9BThinking · open weights
- Qwen 3.5 27bThinking · open weights
- Qwen 3.5 35b A3bThinking · open weights
- Qwen 3.5 122b A10bThinking · open weights
- Qwen 3.5 397b A17bThinking · open weights
- Qwen 3.5 FlashThinking · open weights
- Qwen 3.6 27BThinking · open weights
- Qwen 3.6 35B-A3BThinking · open weights
- Qwen 3.6 FlashThinking · open weights
- Qwen3 Next 80B A3BThinking · open weights
- Qwen 3 Coder 480B A35B InstructNon-thinking · open weights
- Qwen3 VL 2BThinking · open weights
- Qwen3 VL 4BThinking · open weights
- Qwen3 VL 8BThinking · open weights
- Qwen3 VL 30B A3BThinking · open weights
- Qwen3 VL 32BThinking · open weights
- Qwen3 VL 235B A22BThinking · open weights
The Qwen3 family
Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.
Open weights (28)
- Qwen3-0.6B (Thinking)2 variants
- Qwen3-1.7B (Thinking)2 variants
- Qwen3-4B (Thinking)2 variants
- Qwen 3 8b2 variants
- Qwen 3 14b2 variants
- Qwen 3 30b A3b4 variants
- Qwen 3 32b2 variants
- Qwen 3 235b A22b6 variants
- Qwen 3.5 0.8B2 variants
- Qwen 3.5 2B2 variants
- Qwen 3.5 4B2 variants
- Qwen 3.5 9B2 variants
- Qwen 3.5 27b2 variants
- Qwen 3.5 35b A3b2 variants
- Qwen 3.5 122b A10b2 variants
- Qwen 3.5 397b A17b1 variant
- Qwen 3.5 Flash1 variant
- Qwen 3.6 27B1 variant
- Qwen 3.6 35B-A3B1 variant
- Qwen 3.6 Flash1 variant
- Qwen3 Next 80B A3B1 variant
- Qwen 3 Coder 480B A35B Instruct1 variant
- Qwen3 VL 2B1 variant
- Qwen3 VL 4B1 variant
- Qwen3 VL 8B2 variants
- Qwen3 VL 30B A3B1 variant
- Qwen3 VL 32B2 variants
- Qwen3 VL 235B A22B1 variant
Closed · API only (2)
- Qwen 3.6 Plus1 variant
- Qwen3 Max4 variants
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- DeepSeek: V4 Pro Thinking, R1, V3 Compared
DeepSeek: V4 Pro Thinking ranks #15 of 186 with 1.0M-token context and $0.435/$0.87 per 1M tokens. Compare V4, R1, and V3 by workload.
- Gemma: 4 31B IT (Thinking), Gemma 3 Self-Host Compared
Gemma: 4 31B IT (Thinking) ranks #34 of 186 with 262K-token context and $0.12/$0.37 per 1M tokens. Compare Gemma 4 and Gemma 3 by workload.
Caveats
What this page does not tell you, listed honestly.
- Quality score not yet computed for: Qwen 3.5 0.8B, Qwen 3.5 2B, Qwen 3.5 Flash, Qwen 3.6 Flash, Qwen3 VL 2B, Qwen3 VL 4B, Qwen3 VL 8B, Qwen3 VL 30B A3B, Qwen3 VL 32B, Qwen3 VL 235B A22B. We require a minimum benchmark coverage before scoring; until the gap is filled the row shows a dash.
- No tracked API pricing for: Qwen3-0.6B (Thinking), Qwen3-1.7B (Thinking), Qwen3-4B (Thinking), Qwen 3.5 0.8B, Qwen 3.5 2B, Qwen 3.5 4B, Qwen3 VL 2B, Qwen3 VL 4B. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
- Context window not declared for: Qwen3-0.6B (Thinking), Qwen3-1.7B (Thinking), Qwen3-4B (Thinking), Qwen 3.5 0.8B, Qwen 3.5 2B, Qwen 3.5 4B, Qwen3 VL 2B, Qwen3 VL 4B.
Editor's notes
Why this family matters
Qwen3 is the broadest open-weights family currently in our index: dense
models from 0.6B to 32B, two mixture-of-experts builds (235B-A22B,
30B-A3B), and a long-context coding-tuned variant, all under permissive
licensing. Whether any single Qwen3 variant tops the open-weights
leaderboard at the moment you read this is a question for the variant
table on this page. The family's structural value is the spread, and that
is what makes Qwen3 a default candidate for any team that does not want to
be locked to a single API.
When the binding constraint is peak score on a single reasoning benchmark and budget is unbounded, closed flagships are typically the safer pick. When deployment factors carry weight in the decision (latency floor, data sovereignty, predictable per-token cost, the option to fine-tune or self-host), an open-weights variant belongs on the shortlist. Qwen3 makes that shortlist conversation easier than most, because of the variant spread.
How the family is structured
Qwen3 ships in three lines that sit on the same page because they share generation, brand, and licence terms, not because they are interchangeable. Pick the line first, then the variant within it.
- Qwen3 text line. Dense models from
0.6Bto32Bfor self-hosting, plus mixture-of-experts builds (30B-A3B,235B-A22B,5-397B-A17B,6-35B-A3B) for production-scale serving. This is the default chat-and-tools workhorse line; if you are not sure which line applies, you want this one. - Qwen3 Coder. A single variant (
qwen-3-coder-480b-a35b) purpose-built for agentic coding workloads. Pricing and pricing structure are different from the text line; pick this when SWE-bench-class throughput is the binding constraint, not when a chat model would also handle the occasional code question. - Qwen3-VL. Vision-language variants (
2B,4B,8B,30B-A3B,32B,235B-A22B) for image-grounded workloads. Use this line when the workload is layout-aware document extraction, image reasoning, or any task where running OCR-to-text and then a chat-tier model loses information. Caveat: our benchmark coverage on the VL line is thin compared with the text line; treat the listed variants as a shortlist to evaluate against your own data.
A second axis cuts across all three lines: most variants ship with a thinking mode and a non-thinking mode under the same model name. Thinking modes use explicit chain-of-thought before answering and typically cost more tokens to produce a response; non-thinking modes answer directly. The variant table on this page surfaces both modes where they exist; pick by the workload's tolerance for latency and cost-per-call, not by assuming thinking is always better. Some variants additionally show a base or instruct label: base is the foundation checkpoint without instruction-tuning (rarely the right pick for product features); instruct is the tuned version meant for direct deployment.
Which variant to start with
If you are picking up Qwen3 for the first time, default to qwen-qwen3-6-35b-a3b.
It is a mixture-of-experts model: 35B total parameters, only ~3B active
per token, so it serves and costs like a small model on capable hardware
while remaining comparable on quality to dense models 3 to 5 times larger
(see scores in the variant table below).
When to deviate:
- Coding agents: use
qwen-3-coder-480b-a35b. Purpose-built for agentic coding workloads. Pricing runs materially above the value pick (see variant table); worth it only if your workload is dominated by agentic coding loops where the SWE-bench-class score gap pays back the per-token cost. - Self-host on a single GPU: the
8Bor14Bdense variants. Hosted routes typically expose ~40K context, but the official Qwen3 model cards describe larger YaRN-extended windows on these base models. Verify the context limit on the deployment surface you actually use. The MoE versions need either tensor-parallel inference or accept paying for unused experts in VRAM. - Long-document work: watch the provider limit, not just the model
card. Our current hosted rows show many Qwen3 dense variants and some
30B-A3Broutes at ~40K tokens, even though Qwen's model cards describe larger YaRN-extended windows for several base models.235B-A22Bis commonly exposed at 128K. The Qwen3.5 / Qwen3.6 refreshes (397B-A17B,35B-A3B) and Qwen3-Coder reach 256K-class contexts. Qwen3.6 Plus is the 1M-context option in our current data. - You already use a closed flagship and want a fallback: start with
the
235B-A22BMoE. It is the variant most likely to be a drop-in for a GPT-class workload at a fraction of the per-token cost.
Where the data is weak
We aggregate benchmark scores from multiple sources but coverage is not uniform. Specifically:
- The
4Band1.7Bsize tiers have thinner benchmark coverage than their bigger siblings. Treat their listed scores as directional, not comparative. - Several variants are missing release dates upstream. We are working on backfilling these from the registry.
- Display-name conventions across the family are not fully normalised yet
("Qwen 3 32b" for the original, "Qwen 3.5", "Qwen 3.6 35B-A3B"). When
in doubt, the slug (
qwen-qwen3-32bvsqwen-qwen3-6-35b-a3b) is the unambiguous identifier. - Hosted-context vs model-card-context: many tables on this site show the context window the API actually exposes today, which is often smaller than the model card's YaRN-extended ceiling. We are planning to surface both numbers explicitly; in the meantime, treat the listed value as "what the provider serves now."
- Series-level Pareto positioning is not yet in our pipeline; per-variant benchmarks in the table are the load-bearing data.
If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against the provider's own docs before you commit. Pricing changes faster than our scrape cadence.
When to reach for which alternative
- Long-form reasoning chains as the dominant workload: before committing to a Qwen3 variant, check DeepSeek-R1's score on the same benchmark in our index. Long chain-of-thought is the workload where the ranking is most likely to flip family.
- Enterprise procurement where licence terms and US-jurisdiction hosting matter: Llama variants tend to clear those gates with fewer questions than Qwen3 does. Qwen3's licence is permissive, but the provenance conversation is structurally different and worth surfacing with your procurement team early.
- Cost ceiling is high and the only axis is peak quality on a single benchmark: that conversation lives with the closed flagships (GPT-5, Claude Opus 4). Compare scores on the specific benchmark that matters for your workload; the cross-family comparison views in our index are designed for exactly this question.
Sources worth reading
- Qwen3 official model page: release notes and intended use
- Qwen3 on Hugging Face: model cards, license, weights
- Provider docs: Alibaba Cloud DashScope: API pricing source
Recent voices
External pointers worth reading on this family. Curated, dated, attributed; we link to sources rather than reproducing them.
- BlogQwen TeamQwen3 release announcement
Official launch post covering the dense 0.6B to 32B range and the 30B-A3B / 235B-A22B mixture-of-experts variants.
- HFQwen TeamQwen organisation on Hugging Face
Canonical source for model cards, weights and per-variant licence information across the Qwen3 family.
- BlogQwen TeamQwen3-Coder release post
Qwen3-Coder launch covering the 480B-A35B agentic-coding variant and its long-context capabilities.
- HFQwen TeamQwen3.6-35B-A3B model card
Model card for the Qwen3.6 mixture-of-experts refresh; primary source for the 35B/A3B context window and licensing claims.
- GitHubggml-org/llama.cppllama.cpp PR optimizes Qwen 3.5 inference on Apple Silicon
Merged llama.cpp PR specifically tunes Qwen 3.5 kernels for M-series unified memory. Mac users running Qwen 3.5 locally should rebuild; reported decode speedups are large enough to change which variant is practical on a given Mac.
Changelog
- Data
Folded qwen3-max, qwen3-next-80b-a3b, and qwen3.6-flash into the qwen3 surface. Product decision: Qwen3-family by branding, no standalone page. Surface now owns 32 registry slugs.
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Qwen3 update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →