Mistral AI family
Mistral
Mistral: Medium 3.5 (Thinking) ranks #29 of 186 on Quality Score. Compare the chat tier and the Magistral reasoning line by price and workload.
Top in this family
Medium 3.5 (Thinking) ranks #29 of 186 on overall quality (QS 90.5) at $1.5/$7.5 per 1M tokens.
Practical pick
Medium 3.1 at $0.4/$2 per 1M tokens.
- Variants
- 6
- License
- Open + closed mix
- Provider
- Mistral AI
★ Most teams should start here
MistralAI Mistral Medium 3
Variant: Medium 3.1
The practical default. Strong quality for everyday chat and tool-use workloads at materially lower cost than Mistral Large. Step up only when the workload visibly benefits.
- Quality Score
- —
- Input
- $0.400/1M
- Output
- $2.00/1M
- Context
- 131K
- License
- Closed · API
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| General API workhorse | MistralAI Mistral Medium 3 Medium 3.1 $0.400/1M / $2.00/1M | Best practical quality-per-dollar in the family for chat and tool-use. The default unless your evals visibly improve under Mistral Large. |
| High-volume chat | MistralAI Small 3 2506 (June 2025) $0.075/1M / $0.200/1M | Cheapest production-grade Mistral tier. Use for high-volume chat where per-token cost compounds. |
| Coding agents | MistralAI Magistral Medium Thinking | Mistral's reasoning-mode model. Use when chain-of-thought helps the workload and you want to stay within the Mistral stack. |
All variants
15 variants across 6 models. Sorted by quality score (descending).
| Variant | QS | GPQA | HLE | SWE | AIME | In $/M | Out $/M | Context | Released | Lic. |
|---|---|---|---|---|---|---|---|---|---|---|
Thinking Mistral Medium 3.5 | 90.5 #29/186 | — | — | 77.6 | 86.3 | $1.5 | $7.5 | 262K | May 11, 2026 | |
Large 3 MistralAI Mistral Large | 65.9 #134/186 | — | — | — | — | $0.5 | $1.5 | 262K | — | |
Mistral Large MistralAI Mistral Large | — | — | — | — | — | — | — | — | — | |
2402 (February 2024) MistralAI Mistral Large | — | — | — | — | — | — | — | — | — | |
2407 (July 2024) MistralAI Mistral Large | — | — | — | — | — | $2 | $6 | 131K | — | |
2411 (November 2024) MistralAI Mistral Large | — | — | — | — | — | — | — | — | — | |
ThinkingPrevious MistralAI Magistral Small Newer: Mistral Medium 3.5 | 69.8 #118/186 | 68.2 | — | — | 62.8 | — | — | — | Jun 10, 2025 | |
ThinkingPrevious MistralAI Magistral Medium Newer: Mistral Medium 3.5 | 68.3 #125/186 | 70.8 | — | — | 64.9 | — | — | — | Jun 10, 2025 | |
Small 3Previous MistralAI Small 3 | 59.5 #162/186 | 46.0 | — | — | — | $0.075 | $0.2 | 128K | Jun 1, 2025 | |
Medium 3Previous MistralAI Mistral Medium 3 Newer: Mistral Medium 3.5 | 55.6 #172/186 | 59.6 | 4.5 | — | 21.2 | $0.4 | $2 | 131K | — | |
2505 (May 2025)Previous MistralAI Mistral Medium 3 Newer: Mistral Medium 3.5 | — | — | — | — | — | $0.4 | $2 | 131K | — | |
Medium 3.1Previous MistralAI Mistral Medium 3 Newer: Mistral Medium 3.5 | — | — | — | — | — | $0.4 | $2 | 131K | — | |
2501 (January 2025)Previous MistralAI Small 3 | — | — | — | — | — | $0.05 | $0.08 | 33K | Jun 1, 2025 | |
2503 (March 2025)Previous MistralAI Small 3 | — | — | — | — | — | $0.351 | $0.555 | 128K | Jun 1, 2025 | |
2506 (June 2025)Previous MistralAI Small 3 | — | — | — | — | — | $0.075 | $0.2 | 128K | Jun 1, 2025 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (32 of 45 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| MistralAI Small 3 · Small 3 | GPQA Diamond · 5_shot_cot | 46.0 | 1 / 4 | In Quality Score |
| MistralAI Small 3 · Small 3 | MMLU Pro · 5_shot_cot | 66.8 | 2 / 4 | In Quality Score |
| Mistral Medium 3.5 · Thinking | SWE-bench Verified | 77.6 | 16 / 68 | In Quality Score |
| MistralAI Magistral Medium · Thinking | LiveCodeBench | 59.4 | 22 / 69 | In Quality Score |
| Mistral Medium 3.5 · Thinking | AIME 2025 | 86.3 | 23 / 88 | In Quality Score |
| MistralAI Magistral Small · Thinking | LiveCodeBench | 55.4 | 27 / 69 | In Quality Score |
| MistralAI Magistral Medium · Thinking | Aider (Polyglot) | 47.1 | 29 / 45 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | Aider (Polyglot) | 28.9 | 36 / 45 | In Quality Score |
Show all benchmark evidence (45 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| MistralAI Small 3 · Small 3 | GPQA Diamond · 5_shot_cot | 46.0 | 1 / 4 | In Quality Score |
| MistralAI Small 3 · Small 3 | MMLU Pro · 5_shot_cot | 66.8 | 2 / 4 | In Quality Score |
| Mistral Medium 3.5 · Thinking | AIME 2025 | 86.3 | 23 / 88 | In Quality Score |
| MistralAI Magistral Medium · Thinking | AIME 2025 | 64.9 | 47 / 88 | In Quality Score |
| MistralAI Magistral Small · Thinking | AIME 2025 | 62.8 | 49 / 88 | In Quality Score |
| MistralAI Mistral Large · Mistral Large | SimpleBench | 22.5 | 53 / 61 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | Humanity's Last Exam · hle_text | 4.4 | 53 / 56 | In Quality Score |
| MistralAI Mistral Large · Large 3 | SimpleBench | 20.4 | 55 / 61 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | AIME 2025 | 21.2 | 65 / 88 | In Quality Score |
| MistralAI Small 3 · Small 3 | MMLU Pro | 66.8 | 69 / 86 | In Quality Score |
| MistralAI Mistral Large · Large 3 | Arena Elo | 1415 | 73 / 158 | In Quality Score |
| MistralAI Magistral Medium · Thinking | GPQA Diamond | 70.8 | 75 / 143 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3.1 | Arena Elo | 1410 | 82 / 158 | In Quality Score |
| MistralAI Magistral Small · Thinking | GPQA Diamond | 68.2 | 85 / 143 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | Humanity's Last Exam · hle | 4.5 | 88 / 90 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | GPQA Diamond | 59.6 | 101 / 143 | In Quality Score |
| MistralAI Mistral Medium 3 · 2505 (May 2025) | Arena Elo | 1387 | 102 / 158 | In Quality Score |
| MistralAI Small 3 · 2506 (June 2025) | Arena Elo | 1357 | 119 / 158 | In Quality Score |
| MistralAI Small 3 · Small 3 | GPQA Diamond | 46 | 120 / 143 | In Quality Score |
| MistralAI Mistral Large · 2407 (July 2024) | Arena Elo | 1314 | 141 / 158 | In Quality Score |
| MistralAI Mistral Large · 2411 (November 2024) | Arena Elo | 1305 | 142 / 158 | In Quality Score |
| MistralAI Magistral Medium · Thinking | Arena Elo | 1304 | 143 / 158 | In Quality Score |
| MistralAI Small 3 · 2503 (March 2025) | Arena Elo | 1303 | 145 / 158 | In Quality Score |
| MistralAI Small 3 · 2501 (January 2025) | Arena Elo | 1274 | 149 / 158 | In Quality Score |
| MistralAI Mistral Large · 2402 (February 2024) | Arena Elo | 1241 | 152 / 158 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | Arena Elo | 1222 | 155 / 158 | In Quality Score |
| Mistral Medium 3.5 · Thinking | IFBench | 69 | 12 / 28 | Tracked evidence |
| Mistral Medium 3.5 · Thinking | BrowseComp · context_manage | 48.6 | 14 / 15 | Tracked evidence |
| MistralAI Small 3 · Small 3 | MMLU | 80.6 | 27 / 33 | Tracked evidence |
| MistralAI Magistral Medium · Thinking | AIME 2024 | 73.6 | 27 / 69 | Tracked evidence |
| MistralAI Magistral Small · Thinking | AIME 2024 | 70.7 | 29 / 69 | Tracked evidence |
| MistralAI Small 3 · Small 3 | MMMU PRO | 49.3 | 44 / 52 | Tracked evidence |
| MistralAI Mistral Medium 3 · Medium 3 | AIME 2024 | 26.8 | 52 / 69 | Tracked evidence |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Mistral Medium 3.5 · Thinking | SWE-bench Verified | 77.6 | 16 / 68 | In Quality Score |
| MistralAI Magistral Medium · Thinking | LiveCodeBench | 59.4 | 22 / 69 | In Quality Score |
| MistralAI Magistral Small · Thinking | LiveCodeBench | 55.4 | 27 / 69 | In Quality Score |
| MistralAI Magistral Medium · Thinking | Aider (Polyglot) | 47.1 | 29 / 45 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | Aider (Polyglot) | 28.9 | 36 / 45 | In Quality Score |
| MistralAI Mistral Medium 3 · Medium 3 | LiveCodeBench | 29.1 | 49 / 69 | In Quality Score |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| Mistral Medium 3.5 · Thinking | τ³-Bench · telecom | 91.4 | 3 / 6 | Tracked evidence |
| Mistral Medium 3.5 · Thinking | τ³-Bench · retail | 76.1 | 3 / 6 | Tracked evidence |
| Mistral Medium 3.5 · Thinking | τ³-Bench · banking | 13.4 | 5 / 6 | Tracked evidence |
| Mistral Medium 3.5 · Thinking | τ³-Bench · airline | 72 | 6 / 6 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| MistralAI Small 3 · Small 3 | ChartQA | 86.2 | 5 / 9 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| MistralAI Small 3 · Small 3 | DocVQA | 94.1 | 3 / 8 | Tracked evidence |
Where this family sits in the market
Mistral Small 3 sits on the price-efficiency frontier within the family. Mistral Large takes the quality ceiling at proportionate cost. Magistral Medium is the entry point when explicit reasoning helps.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Self-hosting
These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.
- Mistral Medium 3.5Thinking · open weights
- MistralAI Mistral LargeLarge 3 · open weights
- MistralAI Small 32506 (June 2025) · open weights
The Mistral family
Every variant we track in this family, grouped by license. Use this to orient before drilling into the variant table.
Open weights (3)
- Mistral Medium 3.51 variant
- MistralAI Mistral Large5 variants
- MistralAI Small 34 variants
Closed · API only (3)
- MistralAI Mistral Medium 33 variants
- MistralAI Magistral Medium1 variant
- MistralAI Magistral Small1 variant
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- Llama: Muse Spark (Thinking), Llama 4 and 3 Compared
Llama: Muse Spark (Thinking) ranks #12 of 186 on Quality Score. Compare Llama 4, Llama 3, and Muse Spark by self-hosting and workload.
- DeepSeek: V4 Pro Thinking, R1, V3 Compared
DeepSeek: V4 Pro Thinking ranks #15 of 186 with 1.0M-token context and $0.435/$0.87 per 1M tokens. Compare V4, R1, and V3 by workload.
- Qwen3: Qwen 3.7 Max Preview, Qwen3.5, Qwen3.6 Compared
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.
Caveats
What this page does not tell you, listed honestly.
- No tracked API pricing for: MistralAI Magistral Medium, MistralAI Magistral Small. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
- Context window not declared for: MistralAI Magistral Medium, MistralAI Magistral Small.
Editor's notes
Why this family matters
Mistral is the European open-weights option for teams that want a real alternative to the US frontier labs without giving up production-grade quality or pricing transparency. With Medium 3.5 (May 2026, 128B dense, open weights under a modified MIT license with a revenue-tier paid restriction), Mistral has a credible answer to Claude Sonnet and the open-weights Chinese frontier (DeepSeek, GLM, Qwen3.5) on agentic and coding workloads.
This page covers Mistral AI's full lineup: the chat tier (Small 3, Medium 3, Medium 3.5, Mistral Large) and the Magistral reasoning brand (Magistral Medium, Magistral Small).
Mistral vs. Magistral: what's the difference?
Mistral AI ships two product brands. They are not two modes of the same model; they are distinct model lines with separate weights and separate launch announcements.
- Mistral (Small 3, Medium 3, Medium 3.5, Large) is the main chat
and tool-use lineup. The headline newcomer here is Medium 3.5,
which despite living under the Mistral brand is a reasoning-mode
model (only a
thinkingvariant ships). So "Mistral" today is no longer a pure chat lineup; it includes one reasoning-only flagship. - Magistral (Medium, Small) is Mistral's dedicated reasoning brand, launched specifically for chain-of-thought workloads where the chat lineup underperforms. Smaller, more focused, separately benchmarked.
If you only need a chat-tier Mistral, ignore Medium 3.5 and Magistral and pick from Small 3 / Medium 3 / Large. If you need a reasoning model from Mistral AI, the decision is now three-way: Medium 3.5 (newest, sits on the main brand, open weights), Magistral Medium (the dedicated reasoning flagship), or Magistral Small (the cheap reasoning tier). Compare them on the specific reasoning benchmark that matters for your workload, as they do not all win the same evals.
Which variant to start with
For chat and tool-use. Default to Medium 3 when the Mistral-Cloud API is the path of least resistance. Step down to Small 3 for high-volume chat where per-token cost compounds, and up to Mistral Large only when your evals visibly improve at the price step.
For reasoning. Default to Medium 3.5 when its open weights and revenue restriction work for your deployment; it's the newest in the family and ships on the main brand. Move to Magistral Medium when the dedicated reasoning brand wins your eval, or when you want the brand-level signal that Mistral has optimized this line for chain-of- thought specifically. Use Magistral Small when the per-token cost matters and the reasoning gap to Magistral Medium is acceptable.
When to deviate:
- Coding-agent workloads: Medium 3.5 is competitive on SWE-Bench Verified (self-reported by Mistral) and lands inside the open-weights agentic cluster. Compare against Claude Sonnet 4.6 (closed) and GLM-5.1 (open) on your own coding eval before committing.
- High-volume chat: Small 3 stays the right call. Per-token economics beat the quality gap on chat-tier workloads.
- Strongest open reasoning model: compare Magistral Medium and Medium 3.5 against DeepSeek R1, Qwen3.5 thinking, and GLM-5.1 thinking on the specific reasoning benchmark that matters. Mistral's reasoning models are competitive but not always the ceiling.
- Frontier closed reasoning: when budget allows, the Claude Opus thinking variants and Gemini 3 Pro thinking sit above the open reasoning cluster on most evals. Reach for them when the eval gap is large enough to matter.
- Highest closed-weights quality in the family: Mistral Large remains the ceiling tier. Use when the workload visibly benefits and the price step is justified.
Where the data is weak
Mistral's announcement scores are self-reported by Mistral. For Medium 3.5 specifically:
- SWE-Bench Verified is marked as self-reported by Mistral.
- BrowseComp uses context management with a discard-all strategy at 100K tokens; not directly comparable to base BrowseComp scores from other providers without normalising for scaffold.
- τ³-Bench Banking uses agentic-search retrieval and reports the highest of multiple strategies. Other domains (telecom, airline, retail) use the standard scaffold.
Public benchmark coverage on the Magistral line is thinner than on the chat-tier Mistral lineup. Treat announcement tables and Magistral positioning claims as directional signals: useful for positioning, but reproduce the benchmarks that matter for your workload before adopting.
Cost and context
Pricing and context windows for each member are in the variant table below. The Small 3 tier is the practical floor for production-grade Mistral chat economics; Medium 3 and Medium 3.5 sit in the same band as mid-tier closed-API peers; Mistral Large is the closed-weights ceiling. Magistral Medium and Small price in line with their chat-tier siblings of similar size, so the reasoning brand does not carry a separate price premium.
Sources worth reading
- Mistral API pricing: authoritative price list per model and tier
- Mistral model docs: variant identifiers, context windows, modality coverage
- Magistral reasoning announcement: release notes for the Magistral reasoning brand
- Mistral news + announcements: release notes for new generations and pricing changes
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next Mistral update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →