nvidia family
NVIDIA Nemotron
NVIDIA Nemotron: 120B-A12B Super ranks #129 of 186 on Quality Score. Open-weights Nemotron 3 picks by workload.
Top in this family
NVIDIA Nemotron 120B-A12B Super ranks #129 of 186 on overall quality (QS 66.9) at $0.09/$0.45 per 1M tokens.
- Variants
- 2
- License
- Open weights
- Provider
- nvidia
★ Most teams should start here
NVIDIA Nemotron 3 Super
Variant: 120B-A12B
The only Nemotron 3 model with real leaderboard coverage in our index, and the one most teams mean when they shortlist 'NVIDIA's model'. Reach for Nano Omni instead when the workload is omni-modal (image, video, audio) or document understanding, where Super does not compete.
- Quality Score
- 66.9
- Input
- $0.090/1M
- Output
- $0.450/1M
- Context
- 1.0M
- License
- Open weights
Best variant by workload
One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.
| Workload | Best pick | Why |
|---|---|---|
| General API workhorse | NVIDIA Nemotron 3 Super 120B-A12B $0.090/1M / $0.450/1M | The open-weights workhorse of the family. 120B total parameters with ~12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility (data residency, fine-tune, air-gapped inference). |
| Coding agents | NVIDIA Nemotron 3 Super 120B-A12B $0.090/1M / $0.450/1M | NVIDIA reports strong LiveCodeBench results for Super on its model card. Treat that as a directional signal until our index carries an independent coding benchmark for this variant; the per-benchmark table is the load-bearing artifact, not the composite. |
| Document AI / OCR | NVIDIA Nemotron 3 Nano Omni 30B-A3B Reasoning $0.000/1M / $0.000/1M | Nano Omni is the omni-modal line: text, image, video, and audio in one model. NVIDIA's card reports document-understanding scores (OCRBench v2, MMLongBench-Doc, CharXiv) that our main leaderboard does not yet track. Pick it when layout-aware OCR or mixed-media input is the binding requirement. |
| Self-host on 1 GPU | NVIDIA Nemotron 3 Nano Omni 30B-A3B Reasoning $0.000/1M / $0.000/1M | At 30B total / ~3B active, Nano Omni is the smaller-footprint Nemotron and the realistic single-GPU self-host option in this family. Total weights still need to fit in memory, so size the GPU for the full parameter count, not the active subset. |
All variants
3 variants across 2 models. Sorted by quality score (descending) · Open weights.
| Variant | QS | HLE | Tau | In $/M | Out $/M | Context | Released |
|---|---|---|---|---|---|---|---|
120B-A12B Nemotron 3 Super | 66.9 #129/186 | 18.3 | 62.8 | $0.09 | $0.45 | 1.0M | — |
30B-A3B Reasoning Nemotron 3 Nano Omni | — | — | — | $0 | $0 | 256K | Apr 29, 2026 |
30B-A3B Instruct Nemotron 3 Nano Omni | — | — | — | $0 | $0 | 256K | Apr 29, 2026 |
Benchmark evidence
Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (11 of 22 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Super · 120B-A12B | LiveCodeBench · v5 | 81.2 | 2 / 5 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | AIME 2025 · no_tools | 90.2 | 8 / 15 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | LiveCodeBench · v6 | 78.7 | 17 / 40 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · airline | 56.3 | 17 / 29 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · telecom | 64.4 | 19 / 28 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | MMLU Pro | 83.7 | 31 / 86 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · retail | 62.8 | 31 / 34 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | Humanity's Last Exam · tools | 22.8 | 34 / 38 | In Quality Score |
Show all benchmark evidence (22 rows)
Reasoning
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Super · 120B-A12B | AIME 2025 · no_tools | 90.2 | 8 / 15 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | MMLU Pro | 83.7 | 31 / 86 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | Humanity's Last Exam · tools | 22.8 | 34 / 38 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | Humanity's Last Exam · hle | 18.3 | 45 / 90 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | LiveBench | 32.5 | 104 / 110 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | Arena Elo | 1361 | 116 / 158 | In Quality Score |
Coding
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Super · 120B-A12B | LiveCodeBench · v5 | 81.2 | 2 / 5 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | LiveCodeBench · v6 | 78.7 | 17 / 40 | In Quality Score |
Agentic
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · airline | 56.3 | 17 / 29 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · telecom | 64.4 | 19 / 28 | In Quality Score |
| NVIDIA Nemotron 3 Super · 120B-A12B | τ²-bench · retail | 62.8 | 31 / 34 | In Quality Score |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | OSWorld | 47.4 | 6 / 10 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct | OSWorld | 11.1 | 10 / 10 | Tracked evidence |
Multimodal
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | VideoMME | 72.2 | 4 / 4 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | MathVista · mini | 82.8 | 14 / 36 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct | MathVista · mini | 75.5 | 25 / 36 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | CharXiv Reasoning | 63.6 | 34 / 48 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct | CharXiv Reasoning | 41.3 | 44 / 48 | Tracked evidence |
Document/OCR
| Model / Variant | Benchmark | Score | Rank | Scoring |
|---|---|---|---|---|
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | MMLongBench-Doc | 57.5 | 9 / 22 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct | MMLongBench-Doc | 38 | 18 / 22 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning | OCRBench | 67.0 | 33 / 35 | Tracked evidence |
| NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct | OCRBench | 54.8 | 35 / 35 | Tracked evidence |
Where this family sits in the market
Nemotron 3 Super sits in the open-weights reasoning band: strong on academic knowledge benchmarks, mid-pack on Arena and LiveBench. It is a per-benchmark pick, not a composite-score leader. Nano Omni is absent from the quality-vs-cost frontier because its strengths are on multimodal benchmarks our main leaderboard does not track.
Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.
Self-hosting
These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.
- NVIDIA Nemotron 3 Super120B-A12B · open weights
- NVIDIA Nemotron 3 Nano Omni30B-A3B Reasoning · open weights
Alternatives to consider
Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.
- Qwen3: Qwen 3.7 Max Preview, Qwen3.5, Qwen3.6 Compared
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.
- DeepSeek: V4 Pro Thinking, R1, V3 Compared
DeepSeek: V4 Pro Thinking ranks #15 of 186 with 1.0M-token context and $0.435/$0.87 per 1M tokens. Compare V4, R1, and V3 by workload.
- Llama: Muse Spark (Thinking), Llama 4 and 3 Compared
Llama: Muse Spark (Thinking) ranks #12 of 186 on Quality Score. Compare Llama 4, Llama 3, and Muse Spark by self-hosting and workload.
Caveats
What this page does not tell you, listed honestly.
- Quality score not yet computed for: NVIDIA Nemotron 3 Nano Omni. We require a minimum benchmark coverage before scoring; until the gap is filled the row shows a dash.
Editor's notes
NVIDIA's real bet: the open agent stack, not another chatbot
The mistake on this page is reading "NVIDIA" as a hardware company that also dabbles in models. Nemotron is the software half of one strategy. NVIDIA is shipping an open model family, open training and inference recipes, and deployment cookbooks, all of which happen to run best on NVIDIA GPUs. The models are the visible creature. The ecosystem is the skeleton.
So the durable thesis is not "NVIDIA has a competitive model now." It is that NVIDIA is trying to make open agent infrastructure look like an NVIDIA-native software platform: efficient, hardware-aware models designed to run where the data is, locally and privately, inside larger agent systems rather than as a standalone chatbot. The strategic read: the software lowers adoption friction, the hardware captures the demand. That framing outlasts any single benchmark.
The family is roles, not sizes
Nemotron 3 is not one model scaled up and down. NVIDIA describes it as a set of jobs in an agent system:
- Super is the text-reasoning and agent-collaboration worker.
- Ultra is the heavy planner for the hardest reasoning.
- Nano is the efficient small worker.
- Nano Omni is the perception layer: text, image, video, and audio in one model.
Our index currently tracks two of these with real data: Nemotron 3 Super (120B-A12B) and Nemotron 3 Nano Omni (30B-A3B). The other two are family context, not options you can compare here yet.
The useful mental model: Nano Omni is the eyes and ears, Super and Ultra are closer to the brain. That is why Super and Nano Omni are not ranked against each other on this page. They are not measured on the same benchmarks because they do not do the same job. If you are unsure which line applies, look at the input. Text in, reasoning out: Super. Mixed media in (scanned documents, screen recordings, audio, video): Nano Omni.
Where Nemotron 3 Super sits today
Super has dense enough coverage to position honestly. Read these as relative placement across our leaderboard, not a single verdict:
- Quality Score 66.9, rank #129 of 186.
- MMLU Pro 83.7, rank #31 of 85. The family's strongest result, and a knowledge-and-reasoning benchmark where Super is genuinely competitive.
- HLE 18.3, rank #45 of 90.
- Tau-Bench 62.8, rank #31 of 34.
- Arena ELO 1361.0, rank #116 of 157.
- LiveBench 32.5, rank #100 of 105.
The shape of that spread is the signal: Super is strong on academic knowledge (MMLU Pro especially) and mid-pack on Arena ELO and LiveBench. That is a per-benchmark pick, not a composite-score leader. The structural reason it stays affordable is the architecture: Super is a mixture-of-experts model, 120B total parameters with roughly 12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility. NVIDIA's own model card also reports strong LiveCodeBench and GPQA Diamond numbers. Treat self-reported scores as directional until an independent run lands in our index.
Nemotron 3 Nano Omni: the perception layer
Nano Omni is the omni-modal member: text, image, video, and audio in a single 30B-A3B MoE. The word that matters is omni, not just multimodal. Traditional multimodal stacks behave like a bureaucracy: one model transcribes audio, another captions images, another samples video frames, and a language model reasons over the written reports. Every handoff loses context. Nano Omni's claim is simpler: put the evidence into one shared context before the reasoning starts.
The practical reason it exists is that agent workflows rarely begin with a clean prompt. They begin with a PDF, a dashboard, a narrated screen recording, a support call, a slide deck, or a form. Nano Omni is built to turn that mess into structured context for the rest of the agent loop. It is the sub-agent that perceives; Super and Ultra are closer to the parts that plan and decide.
A few design choices are worth knowing because they shape deployment, not just benchmarks: native audio means sound enters the shared context rather than being flattened to a transcript first, which matters when timing and tone carry the signal; dynamic image resolution preserves layout for documents and screenshots instead of forcing everything through fixed tiling; and video-token compression is designed so the model does not spend full compute rereading a static background frame after frame.
On "edge": Nano Omni is edge-friendly only in the GPU-edge or on-prem appliance sense. Even quantised it is a multi-gigabyte model, so the honest phrasing is local and private GPU deployment, not "runs anywhere." Its value is making private multimodal agents realistic for teams that cannot send documents, calls, or screen recordings to a closed API. It has a known context window (256K-token); on hosted economics it is currently listed on OpenRouter only as a free-tier route, so we have no production per-token price for it.
How to pick within the family
- Open-weights reasoning at API or self-host scale: Nemotron 3 Super ($0.09 input / $0.45 output per 1M, 1M-token context). The MoE active-parameter footprint means you pay and serve closer to a mid-size model than the 120B total suggests, while keeping open-weights flexibility: data residency, fine-tuning, air-gapped inference. Strongest on MMLU Pro, mid-pack on Arena.
- Perception is the hard part (documents, screens, audio, video): Nemotron 3 Nano Omni. The only model in the family that takes mixed media in one shared context. Reach for it when the binding requirement is turning messy input into structured context, not when you need the strongest text-reasoning score.
- Building an agent system: use both. Nano Omni reads the world, Super decides what to do with it. The family is designed to be composed, which is the whole point of the role split.
- Single-GPU self-host: Nano Omni is the smaller footprint. Size the GPU for the full 30B parameter count, not the roughly 3B active subset. Total weights still have to fit in memory.
Where this family is the wrong call
- You need closed-API frontier reasoning quality. Super is competitive in the open-weights band but does not clear the closed-frontier tier. If frontier quality is the requirement and open weights is not, GPT-5 or Claude Opus is the conversation. If you need open weights at the reasoning frontier, DeepSeek and the Qwen3 MoE variants sit higher on the harder modern benchmarks.
- You just want the cheapest cloud API and do not care about open weights or local deployment. The whole Nemotron value proposition is control and composition. If you are not using that, you are paying for a property you will not exercise.
- Your serving stack is built around dense inference. Both Nemotron 3 models are mixture-of-experts. If an MoE migration is not worth it yet, that is a structural mismatch, not a benchmark one.
- You need pricing certainty for Nano Omni. Its only public route is a free OpenRouter tier, so production cost is unknown. Cross-check NVIDIA's API docs before committing at scale.
Where the data is weak
- Nano Omni has no standard-leaderboard coverage in our index. Its card-reported multimodal and document benchmarks (OCRBench v2, MMLongBench-Doc, Video-MME, MathVista) are not yet tracked here. This page positions the model. It does not rank it, and the absence of a Quality Score row is a coverage gap on our side, not a quality statement about the model.
- Nano Omni has no production pricing. The OpenRouter listing is free-tier only. Context window is in our index, per-token cost is not.
- Super's coding and GPQA numbers lean on NVIDIA's own model card. Treat them as directional until an independent evaluation lands here.
- "Open" means open weights, datasets, and recipes, not necessarily an OSI-approved licence. Check the licence files before assuming unrestricted commercial use.
- Model cards and pricing change faster than our scrape cadence. For a procurement decision, the variant table on this page is the load-bearing artifact, and you should still verify against NVIDIA's own docs before committing.
Sources worth reading
- NVIDIA Nemotron on Hugging Face: model cards, weights, licence files
- NVIDIA Nemotron 3 Super model card: reported benchmark tables, intended use, precision notes
- NVIDIA Nemotron 3 Nano Omni model card: multimodal and document benchmark tables, the perception sub-agent framing
Changelog
- Editorial
New NVIDIA Nemotron 3 surface created as draft. Owns nemotron-3-super (open-weights reasoning MoE, full leaderboard coverage) and nemotron-3-nano-omni (omni-modal, no standard benchmark coverage yet). Two-line framing; pricing and context-window data still missing for both.
- Data
Backfilled OpenRouter enrichment for both Nemotron 3 models (stale exclusions removed). Super now has hosted pricing and a 262K context window; Nano Omni is free-tier-only on OpenRouter so it gained a 256K context window but no production price.
How we score
Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.
Author: Boris. Read the full methodology.
Get the next NVIDIA Nemotron update
New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.
Subscribe →Need help picking for production?
Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.
See services →