nvidia family

NVIDIA Nemotron

NVIDIA Nemotron: 120B-A12B Super ranks #129 of 186 on Quality Score. Open-weights Nemotron 3 picks by workload.

Top in this family

NVIDIA Nemotron 120B-A12B Super ranks #129 of 186 on overall quality (QS 66.9) at $0.09/$0.45 per 1M tokens.

Variants: 2
License: Open weights
Provider: nvidia

★ Most teams should start here

NVIDIA Nemotron 3 Super

Variant: 120B-A12B

The only Nemotron 3 model with real leaderboard coverage in our index, and the one most teams mean when they shortlist 'NVIDIA's model'. Reach for Nano Omni instead when the workload is omni-modal (image, video, audio) or document understanding, where Super does not compete.

Quality Score: 66.9
Input: $0.090/1M
Output: $0.450/1M
Context: 1.0M
License: Open weights

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.

Workload	Best pick	Why
General API workhorse	NVIDIA Nemotron 3 Super 120B-A12B $0.090/1M / $0.450/1M	The open-weights workhorse of the family. 120B total parameters with ~12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility (data residency, fine-tune, air-gapped inference).
Coding agents	NVIDIA Nemotron 3 Super 120B-A12B $0.090/1M / $0.450/1M	NVIDIA reports strong LiveCodeBench results for Super on its model card. Treat that as a directional signal until our index carries an independent coding benchmark for this variant; the per-benchmark table is the load-bearing artifact, not the composite.
Document AI / OCR	NVIDIA Nemotron 3 Nano Omni 30B-A3B Reasoning $0.000/1M / $0.000/1M	Nano Omni is the omni-modal line: text, image, video, and audio in one model. NVIDIA's card reports document-understanding scores (OCRBench v2, MMLongBench-Doc, CharXiv) that our main leaderboard does not yet track. Pick it when layout-aware OCR or mixed-media input is the binding requirement.
Self-host on 1 GPU	NVIDIA Nemotron 3 Nano Omni 30B-A3B Reasoning $0.000/1M / $0.000/1M	At 30B total / ~3B active, Nano Omni is the smaller-footprint Nemotron and the realistic single-GPU self-host option in this family. Total weights still need to fit in memory, so size the GPU for the full parameter count, not the active subset.

All variants

3 variants across 2 models. Sorted by quality score (descending) · Open weights.

Variant	QS	HLE	Tau	In $/M	Out $/M	Context	Released
120B-A12B Nemotron 3 Super	66.9 #129/186	18.3	62.8	$0.09	$0.45	1.0M	—
30B-A3B Reasoning Nemotron 3 Nano Omni	—	—	—	$0	$0	256K	Apr 29, 2026
30B-A3B Instruct Nemotron 3 Nano Omni	—	—	—	$0	$0	256K	Apr 29, 2026

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (11 of 22 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Super · 120B-A12B	LiveCodeBench · v5	81.2	2 / 5	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	AIME 2025 · no_tools	90.2	8 / 15	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	LiveCodeBench · v6	78.7	17 / 40	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · airline	56.3	17 / 29	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · telecom	64.4	19 / 28	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	MMLU Pro	83.7	31 / 86	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · retail	62.8	31 / 34	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	Humanity's Last Exam · tools	22.8	34 / 38	In Quality Score

Show all benchmark evidence (22 rows)

Reasoning

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Super · 120B-A12B	AIME 2025 · no_tools	90.2	8 / 15	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	MMLU Pro	83.7	31 / 86	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	Humanity's Last Exam · tools	22.8	34 / 38	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	Humanity's Last Exam · hle	18.3	45 / 90	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	LiveBench	32.5	104 / 110	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	Arena Elo	1361	116 / 158	In Quality Score

Coding

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Super · 120B-A12B	LiveCodeBench · v5	81.2	2 / 5	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	LiveCodeBench · v6	78.7	17 / 40	In Quality Score

Agentic

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · airline	56.3	17 / 29	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · telecom	64.4	19 / 28	In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12B	τ²-bench · retail	62.8	31 / 34	In Quality Score
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	OSWorld	47.4	6 / 10	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct	OSWorld	11.1	10 / 10	Tracked evidence

Multimodal

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	VideoMME	72.2	4 / 4	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	MathVista · mini	82.8	14 / 36	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct	MathVista · mini	75.5	25 / 36	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	CharXiv Reasoning	63.6	34 / 48	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct	CharXiv Reasoning	41.3	44 / 48	Tracked evidence

Document/OCR

Model / Variant	Benchmark	Score	Rank	Scoring
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	MMLongBench-Doc	57.5	9 / 22	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct	MMLongBench-Doc	38	18 / 22	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Reasoning	OCRBench	67.0	33 / 35	Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B Instruct	OCRBench	54.8	35 / 35	Tracked evidence

Where this family sits in the market

Nemotron 3 Super sits in the open-weights reasoning band: strong on academic knowledge benchmarks, mid-pack on Arena and LiveBench. It is a per-benchmark pick, not a composite-score leader. Nano Omni is absent from the quality-vs-cost frontier because its strengths are on multimodal benchmarks our main leaderboard does not track.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Self-hosting

These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.

NVIDIA Nemotron 3 Super120B-A12B · open weights
NVIDIA Nemotron 3 Nano Omni30B-A3B Reasoning · open weights

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Qwen3: Qwen 3.7 Max Preview, Qwen3.5, Qwen3.6 Compared
Qwen3: Qwen 3.7 Max Preview ranks #9/186 with 262K context at $0.78/$3.9 per 1M. Compare Qwen3, 3.5, 3.6 by workload.
DeepSeek: V4 Pro Thinking, R1, V3 Compared
DeepSeek: V4 Pro Thinking ranks #15 of 186 with 1.0M-token context and $0.435/$0.87 per 1M tokens. Compare V4, R1, and V3 by workload.
Llama: Muse Spark (Thinking), Llama 4 and 3 Compared
Llama: Muse Spark (Thinking) ranks #12 of 186 on Quality Score. Compare Llama 4, Llama 3, and Muse Spark by self-hosting and workload.

Caveats

What this page does not tell you, listed honestly.

Quality score not yet computed for: NVIDIA Nemotron 3 Nano Omni. We require a minimum benchmark coverage before scoring; until the gap is filled the row shows a dash.

Editor's notes

By borisLast verified 2026-05-14AI-assisted, human-reviewed

NVIDIA's real bet: the open agent stack, not another chatbot

The mistake on this page is reading "NVIDIA" as a hardware company that also dabbles in models. Nemotron is the software half of one strategy. NVIDIA is shipping an open model family, open training and inference recipes, and deployment cookbooks, all of which happen to run best on NVIDIA GPUs. The models are the visible creature. The ecosystem is the skeleton.

So the durable thesis is not "NVIDIA has a competitive model now." It is that NVIDIA is trying to make open agent infrastructure look like an NVIDIA-native software platform: efficient, hardware-aware models designed to run where the data is, locally and privately, inside larger agent systems rather than as a standalone chatbot. The strategic read: the software lowers adoption friction, the hardware captures the demand. That framing outlasts any single benchmark.

The family is roles, not sizes

Nemotron 3 is not one model scaled up and down. NVIDIA describes it as a set of jobs in an agent system:

Super is the text-reasoning and agent-collaboration worker.
Ultra is the heavy planner for the hardest reasoning.
Nano is the efficient small worker.
Nano Omni is the perception layer: text, image, video, and audio in one model.

Our index currently tracks two of these with real data: Nemotron 3 Super (120B-A12B) and Nemotron 3 Nano Omni (30B-A3B). The other two are family context, not options you can compare here yet.

The useful mental model: Nano Omni is the eyes and ears, Super and Ultra are closer to the brain. That is why Super and Nano Omni are not ranked against each other on this page. They are not measured on the same benchmarks because they do not do the same job. If you are unsure which line applies, look at the input. Text in, reasoning out: Super. Mixed media in (scanned documents, screen recordings, audio, video): Nano Omni.

Where Nemotron 3 Super sits today

Super has dense enough coverage to position honestly. Read these as relative placement across our leaderboard, not a single verdict:

Quality Score 66.9, rank #129 of 186.
MMLU Pro 83.7, rank #31 of 85. The family's strongest result, and a knowledge-and-reasoning benchmark where Super is genuinely competitive.
HLE 18.3, rank #45 of 90.
Tau-Bench 62.8, rank #31 of 34.
Arena ELO 1361.0, rank #116 of 157.
LiveBench 32.5, rank #100 of 105.

The shape of that spread is the signal: Super is strong on academic knowledge (MMLU Pro especially) and mid-pack on Arena ELO and LiveBench. That is a per-benchmark pick, not a composite-score leader. The structural reason it stays affordable is the architecture: Super is a mixture-of-experts model, 120B total parameters with roughly 12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility. NVIDIA's own model card also reports strong LiveCodeBench and GPQA Diamond numbers. Treat self-reported scores as directional until an independent run lands in our index.

Nemotron 3 Nano Omni: the perception layer

Nano Omni is the omni-modal member: text, image, video, and audio in a single 30B-A3B MoE. The word that matters is omni, not just multimodal. Traditional multimodal stacks behave like a bureaucracy: one model transcribes audio, another captions images, another samples video frames, and a language model reasons over the written reports. Every handoff loses context. Nano Omni's claim is simpler: put the evidence into one shared context before the reasoning starts.

The practical reason it exists is that agent workflows rarely begin with a clean prompt. They begin with a PDF, a dashboard, a narrated screen recording, a support call, a slide deck, or a form. Nano Omni is built to turn that mess into structured context for the rest of the agent loop. It is the sub-agent that perceives; Super and Ultra are closer to the parts that plan and decide.

A few design choices are worth knowing because they shape deployment, not just benchmarks: native audio means sound enters the shared context rather than being flattened to a transcript first, which matters when timing and tone carry the signal; dynamic image resolution preserves layout for documents and screenshots instead of forcing everything through fixed tiling; and video-token compression is designed so the model does not spend full compute rereading a static background frame after frame.

On "edge": Nano Omni is edge-friendly only in the GPU-edge or on-prem appliance sense. Even quantised it is a multi-gigabyte model, so the honest phrasing is local and private GPU deployment, not "runs anywhere." Its value is making private multimodal agents realistic for teams that cannot send documents, calls, or screen recordings to a closed API. It has a known context window (256K-token); on hosted economics it is currently listed on OpenRouter only as a free-tier route, so we have no production per-token price for it.

How to pick within the family

Open-weights reasoning at API or self-host scale: Nemotron 3 Super ($0.09 input / $0.45 output per 1M, 1M-token context). The MoE active-parameter footprint means you pay and serve closer to a mid-size model than the 120B total suggests, while keeping open-weights flexibility: data residency, fine-tuning, air-gapped inference. Strongest on MMLU Pro, mid-pack on Arena.
Perception is the hard part (documents, screens, audio, video): Nemotron 3 Nano Omni. The only model in the family that takes mixed media in one shared context. Reach for it when the binding requirement is turning messy input into structured context, not when you need the strongest text-reasoning score.
Building an agent system: use both. Nano Omni reads the world, Super decides what to do with it. The family is designed to be composed, which is the whole point of the role split.
Single-GPU self-host: Nano Omni is the smaller footprint. Size the GPU for the full 30B parameter count, not the roughly 3B active subset. Total weights still have to fit in memory.

Where this family is the wrong call

You need closed-API frontier reasoning quality. Super is competitive in the open-weights band but does not clear the closed-frontier tier. If frontier quality is the requirement and open weights is not, GPT-5 or Claude Opus is the conversation. If you need open weights at the reasoning frontier, DeepSeek and the Qwen3 MoE variants sit higher on the harder modern benchmarks.
You just want the cheapest cloud API and do not care about open weights or local deployment. The whole Nemotron value proposition is control and composition. If you are not using that, you are paying for a property you will not exercise.
Your serving stack is built around dense inference. Both Nemotron 3 models are mixture-of-experts. If an MoE migration is not worth it yet, that is a structural mismatch, not a benchmark one.
You need pricing certainty for Nano Omni. Its only public route is a free OpenRouter tier, so production cost is unknown. Cross-check NVIDIA's API docs before committing at scale.

Where the data is weak

Nano Omni has no standard-leaderboard coverage in our index. Its card-reported multimodal and document benchmarks (OCRBench v2, MMLongBench-Doc, Video-MME, MathVista) are not yet tracked here. This page positions the model. It does not rank it, and the absence of a Quality Score row is a coverage gap on our side, not a quality statement about the model.
Nano Omni has no production pricing. The OpenRouter listing is free-tier only. Context window is in our index, per-token cost is not.
Super's coding and GPQA numbers lean on NVIDIA's own model card. Treat them as directional until an independent evaluation lands here.
"Open" means open weights, datasets, and recipes, not necessarily an OSI-approved licence. Check the licence files before assuming unrestricted commercial use.
Model cards and pricing change faster than our scrape cadence. For a procurement decision, the variant table on this page is the load-bearing artifact, and you should still verify against NVIDIA's own docs before committing.

Sources worth reading

NVIDIA Nemotron on Hugging Face: model cards, weights, licence files
NVIDIA Nemotron 3 Super model card: reported benchmark tables, intended use, precision notes
NVIDIA Nemotron 3 Nano Omni model card: multimodal and document benchmark tables, the perception sub-agent framing

Changelog

2026-05-14Editorial
New NVIDIA Nemotron 3 surface created as draft. Owns nemotron-3-super (open-weights reasoning MoE, full leaderboard coverage) and nemotron-3-nano-omni (omni-modal, no standard benchmark coverage yet). Two-line framing; pricing and context-window data still missing for both.
2026-05-14Data
Backfilled OpenRouter enrichment for both Nemotron 3 models (stale exclusions removed). Super now has hosted pricing and a 262K context window; Nano Omni is free-tier-only on OpenRouter so it gained a 256K context window but no production price.

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next NVIDIA Nemotron update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →