nvidia family

NVIDIA Nemotron

NVIDIA Nemotron: 120B-A12B Super ranks #129 of 186 on Quality Score. Open-weights Nemotron 3 picks by workload.

Top in this family

NVIDIA Nemotron 120B-A12B Super ranks #129 of 186 on overall quality (QS 66.9) at $0.09/$0.45 per 1M tokens.

Variants
2
License
Open weights
Provider
nvidia

★ Most teams should start here

NVIDIA Nemotron 3 Super

Variant: 120B-A12B

The only Nemotron 3 model with real leaderboard coverage in our index, and the one most teams mean when they shortlist 'NVIDIA's model'. Reach for Nano Omni instead when the workload is omni-modal (image, video, audio) or document understanding, where Super does not compete.

Quality Score
66.9
Input
$0.090/1M
Output
$0.450/1M
Context
1.0M
License
Open weights

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
General API workhorse
NVIDIA Nemotron 3 Super
120B-A12B
$0.090/1M / $0.450/1M
The open-weights workhorse of the family. 120B total parameters with ~12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility (data residency, fine-tune, air-gapped inference).
Coding agents
NVIDIA Nemotron 3 Super
120B-A12B
$0.090/1M / $0.450/1M
NVIDIA reports strong LiveCodeBench results for Super on its model card. Treat that as a directional signal until our index carries an independent coding benchmark for this variant; the per-benchmark table is the load-bearing artifact, not the composite.
Document AI / OCR
NVIDIA Nemotron 3 Nano Omni
30B-A3B Reasoning
$0.000/1M / $0.000/1M
Nano Omni is the omni-modal line: text, image, video, and audio in one model. NVIDIA's card reports document-understanding scores (OCRBench v2, MMLongBench-Doc, CharXiv) that our main leaderboard does not yet track. Pick it when layout-aware OCR or mixed-media input is the binding requirement.
Self-host on 1 GPU
NVIDIA Nemotron 3 Nano Omni
30B-A3B Reasoning
$0.000/1M / $0.000/1M
At 30B total / ~3B active, Nano Omni is the smaller-footprint Nemotron and the realistic single-GPU self-host option in this family. Total weights still need to fit in memory, so size the GPU for the full parameter count, not the active subset.

All variants

3 variants across 2 models. Sorted by quality score (descending) · Open weights.

VariantQSHLETauIn $/MOut $/MContextReleased
120B-A12B
Nemotron 3 Super
66.9
#129/186
18.362.8$0.09$0.451.0M
30B-A3B Reasoning
Nemotron 3 Nano Omni
$0$0256KApr 29, 2026
30B-A3B Instruct
Nemotron 3 Nano Omni
$0$0256KApr 29, 2026

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (11 of 22 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Super · 120B-A12BLiveCodeBench · v581.22 / 5In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BAIME 2025 · no_tools90.28 / 15In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BLiveCodeBench · v678.717 / 40In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · airline56.317 / 29In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · telecom64.419 / 28In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BMMLU Pro83.731 / 86In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · retail62.831 / 34In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BHumanity's Last Exam · tools22.834 / 38In Quality Score
Show all benchmark evidence (22 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Super · 120B-A12BAIME 2025 · no_tools90.28 / 15In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BMMLU Pro83.731 / 86In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BHumanity's Last Exam · tools22.834 / 38In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BHumanity's Last Exam · hle18.345 / 90In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BLiveBench32.5104 / 110In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BArena Elo1361116 / 158In Quality Score

Coding

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Super · 120B-A12BLiveCodeBench · v581.22 / 5In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12BLiveCodeBench · v678.717 / 40In Quality Score

Agentic

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · airline56.317 / 29In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · telecom64.419 / 28In Quality Score
NVIDIA Nemotron 3 Super · 120B-A12Bτ²-bench · retail62.831 / 34In Quality Score
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningOSWorld47.46 / 10Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B InstructOSWorld11.110 / 10Tracked evidence

Multimodal

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningVideoMME72.24 / 4Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningMathVista · mini82.814 / 36Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B InstructMathVista · mini75.525 / 36Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningCharXiv Reasoning63.634 / 48Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B InstructCharXiv Reasoning41.344 / 48Tracked evidence

Document/OCR

Model / VariantBenchmarkScoreRankScoring
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningMMLongBench-Doc57.59 / 22Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B InstructMMLongBench-Doc3818 / 22Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B ReasoningOCRBench67.033 / 35Tracked evidence
NVIDIA Nemotron 3 Nano Omni · 30B-A3B InstructOCRBench54.835 / 35Tracked evidence

Where this family sits in the market

Nemotron 3 Super sits in the open-weights reasoning band: strong on academic knowledge benchmarks, mid-pack on Arena and LiveBench. It is a per-benchmark pick, not a composite-score leader. Nano Omni is absent from the quality-vs-cost frontier because its strengths are on multimodal benchmarks our main leaderboard does not track.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Self-hosting

These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.

  • NVIDIA Nemotron 3 Super120B-A12B · open weights
  • NVIDIA Nemotron 3 Nano Omni30B-A3B Reasoning · open weights

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Caveats

What this page does not tell you, listed honestly.

  • Quality score not yet computed for: NVIDIA Nemotron 3 Nano Omni. We require a minimum benchmark coverage before scoring; until the gap is filled the row shows a dash.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

NVIDIA's real bet: the open agent stack, not another chatbot

The mistake on this page is reading "NVIDIA" as a hardware company that also dabbles in models. Nemotron is the software half of one strategy. NVIDIA is shipping an open model family, open training and inference recipes, and deployment cookbooks, all of which happen to run best on NVIDIA GPUs. The models are the visible creature. The ecosystem is the skeleton.

So the durable thesis is not "NVIDIA has a competitive model now." It is that NVIDIA is trying to make open agent infrastructure look like an NVIDIA-native software platform: efficient, hardware-aware models designed to run where the data is, locally and privately, inside larger agent systems rather than as a standalone chatbot. The strategic read: the software lowers adoption friction, the hardware captures the demand. That framing outlasts any single benchmark.

The family is roles, not sizes

Nemotron 3 is not one model scaled up and down. NVIDIA describes it as a set of jobs in an agent system:

  • Super is the text-reasoning and agent-collaboration worker.
  • Ultra is the heavy planner for the hardest reasoning.
  • Nano is the efficient small worker.
  • Nano Omni is the perception layer: text, image, video, and audio in one model.

Our index currently tracks two of these with real data: Nemotron 3 Super (120B-A12B) and Nemotron 3 Nano Omni (30B-A3B). The other two are family context, not options you can compare here yet.

The useful mental model: Nano Omni is the eyes and ears, Super and Ultra are closer to the brain. That is why Super and Nano Omni are not ranked against each other on this page. They are not measured on the same benchmarks because they do not do the same job. If you are unsure which line applies, look at the input. Text in, reasoning out: Super. Mixed media in (scanned documents, screen recordings, audio, video): Nano Omni.

Where Nemotron 3 Super sits today

Super has dense enough coverage to position honestly. Read these as relative placement across our leaderboard, not a single verdict:

  • Quality Score 66.9, rank #129 of 186.
  • MMLU Pro 83.7, rank #31 of 85. The family's strongest result, and a knowledge-and-reasoning benchmark where Super is genuinely competitive.
  • HLE 18.3, rank #45 of 90.
  • Tau-Bench 62.8, rank #31 of 34.
  • Arena ELO 1361.0, rank #116 of 157.
  • LiveBench 32.5, rank #100 of 105.

The shape of that spread is the signal: Super is strong on academic knowledge (MMLU Pro especially) and mid-pack on Arena ELO and LiveBench. That is a per-benchmark pick, not a composite-score leader. The structural reason it stays affordable is the architecture: Super is a mixture-of-experts model, 120B total parameters with roughly 12B active per token, so it serves closer to a mid-size model on capable hardware while keeping open-weights flexibility. NVIDIA's own model card also reports strong LiveCodeBench and GPQA Diamond numbers. Treat self-reported scores as directional until an independent run lands in our index.

Nemotron 3 Nano Omni: the perception layer

Nano Omni is the omni-modal member: text, image, video, and audio in a single 30B-A3B MoE. The word that matters is omni, not just multimodal. Traditional multimodal stacks behave like a bureaucracy: one model transcribes audio, another captions images, another samples video frames, and a language model reasons over the written reports. Every handoff loses context. Nano Omni's claim is simpler: put the evidence into one shared context before the reasoning starts.

The practical reason it exists is that agent workflows rarely begin with a clean prompt. They begin with a PDF, a dashboard, a narrated screen recording, a support call, a slide deck, or a form. Nano Omni is built to turn that mess into structured context for the rest of the agent loop. It is the sub-agent that perceives; Super and Ultra are closer to the parts that plan and decide.

A few design choices are worth knowing because they shape deployment, not just benchmarks: native audio means sound enters the shared context rather than being flattened to a transcript first, which matters when timing and tone carry the signal; dynamic image resolution preserves layout for documents and screenshots instead of forcing everything through fixed tiling; and video-token compression is designed so the model does not spend full compute rereading a static background frame after frame.

On "edge": Nano Omni is edge-friendly only in the GPU-edge or on-prem appliance sense. Even quantised it is a multi-gigabyte model, so the honest phrasing is local and private GPU deployment, not "runs anywhere." Its value is making private multimodal agents realistic for teams that cannot send documents, calls, or screen recordings to a closed API. It has a known context window (256K-token); on hosted economics it is currently listed on OpenRouter only as a free-tier route, so we have no production per-token price for it.

How to pick within the family

  • Open-weights reasoning at API or self-host scale: Nemotron 3 Super ($0.09 input / $0.45 output per 1M, 1M-token context). The MoE active-parameter footprint means you pay and serve closer to a mid-size model than the 120B total suggests, while keeping open-weights flexibility: data residency, fine-tuning, air-gapped inference. Strongest on MMLU Pro, mid-pack on Arena.
  • Perception is the hard part (documents, screens, audio, video): Nemotron 3 Nano Omni. The only model in the family that takes mixed media in one shared context. Reach for it when the binding requirement is turning messy input into structured context, not when you need the strongest text-reasoning score.
  • Building an agent system: use both. Nano Omni reads the world, Super decides what to do with it. The family is designed to be composed, which is the whole point of the role split.
  • Single-GPU self-host: Nano Omni is the smaller footprint. Size the GPU for the full 30B parameter count, not the roughly 3B active subset. Total weights still have to fit in memory.

Where this family is the wrong call

  • You need closed-API frontier reasoning quality. Super is competitive in the open-weights band but does not clear the closed-frontier tier. If frontier quality is the requirement and open weights is not, GPT-5 or Claude Opus is the conversation. If you need open weights at the reasoning frontier, DeepSeek and the Qwen3 MoE variants sit higher on the harder modern benchmarks.
  • You just want the cheapest cloud API and do not care about open weights or local deployment. The whole Nemotron value proposition is control and composition. If you are not using that, you are paying for a property you will not exercise.
  • Your serving stack is built around dense inference. Both Nemotron 3 models are mixture-of-experts. If an MoE migration is not worth it yet, that is a structural mismatch, not a benchmark one.
  • You need pricing certainty for Nano Omni. Its only public route is a free OpenRouter tier, so production cost is unknown. Cross-check NVIDIA's API docs before committing at scale.

Where the data is weak

  • Nano Omni has no standard-leaderboard coverage in our index. Its card-reported multimodal and document benchmarks (OCRBench v2, MMLongBench-Doc, Video-MME, MathVista) are not yet tracked here. This page positions the model. It does not rank it, and the absence of a Quality Score row is a coverage gap on our side, not a quality statement about the model.
  • Nano Omni has no production pricing. The OpenRouter listing is free-tier only. Context window is in our index, per-token cost is not.
  • Super's coding and GPQA numbers lean on NVIDIA's own model card. Treat them as directional until an independent evaluation lands here.
  • "Open" means open weights, datasets, and recipes, not necessarily an OSI-approved licence. Check the licence files before assuming unrestricted commercial use.
  • Model cards and pricing change faster than our scrape cadence. For a procurement decision, the variant table on this page is the load-bearing artifact, and you should still verify against NVIDIA's own docs before committing.

Sources worth reading

Changelog

  • Editorial

    New NVIDIA Nemotron 3 surface created as draft. Owns nemotron-3-super (open-weights reasoning MoE, full leaderboard coverage) and nemotron-3-nano-omni (omni-modal, no standard benchmark coverage yet). Two-line framing; pricing and context-window data still missing for both.

  • Data

    Backfilled OpenRouter enrichment for both Nemotron 3 models (stale exclusions removed). Super now has hosted pricing and a 262K context window; Nano Omni is free-tier-only on OpenRouter so it gained a 256K context window but no production price.

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next NVIDIA Nemotron update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →