Skip to content

launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config#1714

Merged
ChenhanYu merged 1 commit into
mainfrom
pensieve-intern/OMNIML-5095/cell-t0-d7
Jun 19, 2026
Merged

launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config#1714
ChenhanYu merged 1 commit into
mainfrom
pensieve-intern/OMNIML-5095/cell-t0-d7

Conversation

@ChenhanYu

@ChenhanYu ChenhanYu commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Adds a SPEED-bench MTP speculative-decoding YAML for NVIDIA-Nemotron-3-Super-120B-A12B-BF16 via vLLM.

Covers two splits:

  • qualitative — 32 concurrent, 4096 output tokens
  • throughput_32k — 8 concurrent, 80 requests, 4096 output tokens

Both tasks run tp_size=4 on a single 4×H100/A100 node.

Part of OMNIML-5095 / OMNIML-5098.

Test plan

  • uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml --dryrun --yes -v

Summary by CodeRabbit

Release Notes

  • Chores
    • Added a new benchmark configuration for evaluating NVIDIA Nemotron-3-Super-120B model performance using speculative decoding with vLLM. Configuration includes qualitative and throughput benchmark tasks.

@copy-pr-bot

copy-pr-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12b89ce7-4208-4e54-95d5-38bd78636a1d

📥 Commits

Reviewing files that changed from the base of the PR and between 93dd08f and a85498f.

📒 Files selected for processing (1)
  • tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml

📝 Walkthrough

Walkthrough

A new YAML benchmark configuration file is added for the NVIDIA Nemotron-3-Super-120B-A12B BF16 model. It defines two pipeline tasks (task_0 for qualitative and task_1 for throughput_32k) both invoking common/specdec_bench/run.sh with MTP speculative decoding settings (draft length 3, tp_size 4, ep_size 1) under a single-node, 4-GPU Slurm job using the vLLM container.

Changes

MTP vLLM Benchmark Configuration

Layer / File(s) Summary
Benchmark YAML with two pipeline tasks and Slurm settings
tools/launcher/examples/Nemotron-h/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_mtp_vllm.yaml
Adds a 73-line YAML defining the job name, model checkpoint path, two SPEED-Bench MTP pipeline tasks (task_0 qualitative, task_1 throughput_32k) with their respective concurrency and dataset split arguments, and shared Slurm/container/environment variables (draft length 3, tp_size 4, ep_size 1, single node, 4 GPUs).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1638: Adds a specdec_bench YAML with the same two-task (qualitative + throughput_32k) vLLM structure invoking common/specdec_bench/run.sh, differing by speculative algorithm (DFLASH vs MTP) and model.
  • NVIDIA/Model-Optimizer#1656: Adds a specdec_bench YAML for the same Nemotron-3-Super-120B-A12B BF16 model with the identical two-task pipeline structure, differing only in speculative decoding method (DFlash vs MTP).

Suggested reviewers

  • yeyu-nvidia
  • h-guo18
  • kevalmorabia97
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding a Nemotron-3-Super-120B-A12B-BF16 MTP vLLM speculative decoding benchmark configuration file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR adds only a YAML configuration file with no Python code, dependencies, or security anti-patterns. Security checks apply only to Python code changes, which this PR does not contain.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pensieve-intern/OMNIML-5095/cell-t0-d7

Comment @coderabbitai help to get the list of available commands and usage tips.

…config

Adds SPEED-bench MTP speculative-decoding YAML for
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 via vLLM, covering the
qualitative and throughput_32k splits with tp_size=4.

Part of OMNIML-5095 / OMNIML-5098.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
@ChenhanYu ChenhanYu force-pushed the pensieve-intern/OMNIML-5095/cell-t0-d7 branch from b536d3e to a85498f Compare June 19, 2026 03:46
@copy-pr-bot

copy-pr-bot Bot commented Jun 19, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ChenhanYu ChenhanYu changed the title [OMNIML-5098] cell_t0_d7 launcher: add Nemotron-3-Super-120B-A12B-BF16 MTP vLLM specdec bench config Jun 19, 2026
@ChenhanYu ChenhanYu marked this pull request as ready for review June 19, 2026 03:46
@ChenhanYu ChenhanYu requested a review from a team as a code owner June 19, 2026 03:46
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.95%. Comparing base (e012529) to head (a85498f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1714      +/-   ##
==========================================
+ Coverage   76.92%   76.95%   +0.03%     
==========================================
  Files         511      511              
  Lines       56360    56360              
==========================================
+ Hits        43356    43373      +17     
+ Misses      13004    12987      -17     
Flag Coverage Δ
regression 14.70% <ø> (+0.06%) ⬆️
unit 54.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChenhanYu ChenhanYu merged commit fa1d13f into main Jun 19, 2026
45 checks passed
@ChenhanYu ChenhanYu deleted the pensieve-intern/OMNIML-5095/cell-t0-d7 branch June 19, 2026 23:18
@github-actions

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-19 23:18 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants