launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config by ChenhanYu · Pull Request #1714 · NVIDIA/Model-Optimizer

ChenhanYu · 2026-06-14T03:32:59Z

Adds a SPEED-bench MTP speculative-decoding YAML for NVIDIA-Nemotron-3-Super-120B-A12B-BF16 via vLLM.

Covers two splits:

qualitative — 32 concurrent, 4096 output tokens
throughput_32k — 8 concurrent, 80 requests, 4096 output tokens

Both tasks run tp_size=4 on a single 4×H100/A100 node.

Part of OMNIML-5095 / OMNIML-5098.

Test plan

uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml --dryrun --yes -v

Summary by CodeRabbit

Release Notes

Chores
- Added a new benchmark configuration for evaluating NVIDIA Nemotron-3-Super-120B model performance using speculative decoding with vLLM. Configuration includes qualitative and throughput benchmark tasks.

copy-pr-bot · 2026-06-14T03:33:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-14T03:33:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12b89ce7-4208-4e54-95d5-38bd78636a1d

📥 Commits

Reviewing files that changed from the base of the PR and between 93dd08f and a85498f.

📒 Files selected for processing (1)

tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml

📝 Walkthrough

Walkthrough

A new YAML benchmark configuration file is added for the NVIDIA Nemotron-3-Super-120B-A12B BF16 model. It defines two pipeline tasks (task_0 for qualitative and task_1 for throughput_32k) both invoking common/specdec_bench/run.sh with MTP speculative decoding settings (draft length 3, tp_size 4, ep_size 1) under a single-node, 4-GPU Slurm job using the vLLM container.

Changes

MTP vLLM Benchmark Configuration

Layer / File(s)	Summary
Benchmark YAML with two pipeline tasks and Slurm settings `tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml`	Adds a 73-line YAML defining the job name, model checkpoint path, two SPEED-Bench MTP pipeline tasks (`task_0` qualitative, `task_1` throughput_32k) with their respective concurrency and dataset split arguments, and shared Slurm/container/environment variables (draft length 3, tp_size 4, ep_size 1, single node, 4 GPUs).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1638: Adds a specdec_bench YAML with the same two-task (qualitative + throughput_32k) vLLM structure invoking common/specdec_bench/run.sh, differing by speculative algorithm (DFLASH vs MTP) and model.
NVIDIA/Model-Optimizer#1656: Adds a specdec_bench YAML for the same Nemotron-3-Super-120B-A12B BF16 model with the identical two-task pipeline structure, differing only in speculative decoding method (DFlash vs MTP).

Suggested reviewers

yeyu-nvidia
h-guo18
kevalmorabia97

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: adding a Nemotron-3-Super-120B-A12B-BF16 MTP vLLM speculative decoding benchmark configuration file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR adds only a YAML configuration file with no Python code, dependencies, or security anti-patterns. Security checks apply only to Python code changes, which this PR does not contain.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pensieve-intern/OMNIML-5095/cell-t0-d7

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…config Adds SPEED-bench MTP speculative-decoding YAML for NVIDIA-Nemotron-3-Super-120B-A12B-BF16 via vLLM, covering the qualitative and throughput_32k splits with tp_size=4. Part of OMNIML-5095 / OMNIML-5098. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

copy-pr-bot · 2026-06-19T03:46:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2026-06-19T03:55:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.95%. Comparing base (e012529) to head (a85498f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1714      +/-   ##
==========================================
+ Coverage   76.92%   76.95%   +0.03%     
==========================================
  Files         511      511              
  Lines       56360    56360              
==========================================
+ Hits        43356    43373      +17     
+ Misses      13004    12987      -17

Flag	Coverage Δ
regression	`14.70% <ø> (+0.06%)`	⬆️
unit	`54.33% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-06-19T23:18:54Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-19 23:18 UTC

ChenhanYu force-pushed the pensieve-intern/OMNIML-5095/cell-t0-d7 branch from b536d3e to a85498f Compare June 19, 2026 03:46

ChenhanYu changed the title ~~[OMNIML-5098] cell_t0_d7~~ launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config Jun 19, 2026

ChenhanYu marked this pull request as ready for review June 19, 2026 03:46

ChenhanYu requested a review from a team as a code owner June 19, 2026 03:46

coderabbitai Bot approved these changes Jun 19, 2026

View reviewed changes

h-guo18 approved these changes Jun 19, 2026

View reviewed changes

ChenhanYu merged commit fa1d13f into main Jun 19, 2026
45 checks passed

ChenhanYu deleted the pensieve-intern/OMNIML-5095/cell-t0-d7 branch June 19, 2026 23:18

ChenhanYu mentioned this pull request Jun 20, 2026

[OMNIML-5098] cell_t0_d7 #1776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config#1714

launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config#1714
ChenhanYu merged 1 commit into
mainfrom
pensieve-intern/OMNIML-5095/cell-t0-d7

ChenhanYu commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

copy-pr-bot Bot commented Jun 19, 2026

Uh oh!

codecov Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChenhanYu commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

copy-pr-bot Bot commented Jun 19, 2026

Uh oh!

codecov Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenhanYu commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

codecov Bot commented Jun 19, 2026 •

edited

Loading