[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image by functionstackx · Pull Request #1284 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-05T17:44:18Z

Summary

Switch the dsv4-fp8-h200-vllm-mtp image from vllm/vllm-openai:v0.20.1@sha256:9eff9734... (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build pinned by SHA, cu129).

Single change to .github/configs/nvidia-master.yaml under dsv4-fp8-h200-vllm-mtp.
Adds a perf-changelog.yaml entry to trigger the new sweep.

Test plan

Trigger the dsv4-fp8-h200-vllm-mtp workflow on an H200 runner and confirm the new image pulls cleanly.
Engine starts and the sweep completes for at least one cell from each of the two seq-len-configs.
Acceptance rate / throughput at matched concurrency points are at least on par with the prior vllm-openai:v0.20.1 numbers.

🤖 Generated with Claude Code

Move from vllm/vllm-openai:v0.20.1 (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build on cu129). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-05T17:44:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Enroot expects the registry and path to be separated by '#' rather than '/' in the image URL. Other entries in nvidia-master.yaml already use this form (nvcr.io#..., ghcr.io#...). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-05T17:47:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392580021
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25392580021

claude · 2026-05-05T17:50:23Z

+- config-keys:
+    - dsv4-fp8-h200-vllm-mtp
+  description:
+    - "Switch image from vllm/vllm-openai:v0.20.1 (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build, cu129)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284


🔴 The new perf-changelog.yaml entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER instead of the real PR number — the placeholder substitution wasn't done before push. The link will 404, breaking traceability from this changelog row to the originating PR. Fix by replacing PLACEHOLDER with 1284.

Extended reasoning...

What the bug is

perf-changelog.yaml line 2222 (the pr-link for the new dsv4-fp8-h200-vllm-mtp image-switch entry added by this PR) reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER

The literal token PLACEHOLDER was committed instead of the real PR number 1284.

How it manifests / proof

Step-by-step verification against the tip of this branch (commit e7db559):

git show e7db559 -- perf-changelog.yaml shows the diff hunk ending with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — i.e. the committed bytes contain the literal string PLACEHOLDER, not 1284.

Reading the working-tree file at line 2222 confirms the same: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER.

All other pr-link entries in the same file (e.g. line 2189: pull/1256, line 2199: pull/1065, line 2209: pull/1222, line 2216: pull/1279) use real numeric PR ids, so the format and convention are unambiguous.

Browsing https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER returns 404 (no such PR), so the link is dead on arrival.

Note: the rendered PR-diff block in some review tooling may post-process and display pull/1284, but the actual committed file content (the source of truth that downstream tooling and humans will read) contains PLACEHOLDER. Both this branch HEAD and the working tree confirm PLACEHOLDER.

Why existing code does not prevent it

There is no schema validation on pr-link values in this repo — the YAML is a plain document, so a bogus URL passes through silently. The author or a templating step (commonly a sed substitution that replaces PLACEHOLDER after the PR number is known) clearly intended to fill this in but missed it before pushing.

Impact

The changelog pr-link is the single field that ties a changelog row back to the PR that introduced the behavior change. With PLACEHOLDER, that traceability is broken for this entry only — humans following the link land on a 404; any tooling that resolves pr-link to fetch PR context (release notes, dashboards, automated changelog renderers) will fail or skip this row.

It is a documentation/metadata defect rather than a runtime defect, so it does not break the actual sweep. But it permanently corrupts the changelog history if merged as-is.

Fix

One-line edit on line 2222:

- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284

claude · 2026-05-05T17:50:24Z

 # (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds
 # --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.
 dsv4-fp8-h200-vllm-mtp:
-  image: vllm/vllm-openai:v0.20.1@sha256:9eff9734a30b6713a8566217d36f8277630fd2d31cec7f0a0292835901a23aa4
+  image: public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129


🟡 Stale comment at lines 2633-2635 still claims the entry "Uses the canonical v0.20.1 image (the non-MTP entry above is still on the deepseekv4-cu129 tag)", but this PR switches the image on line 2637 to a vllm-release-repo nightly cu129 build pinned by SHA. Both halves of the comment are now factually wrong. Update the comment alongside the image bump so future readers aren't misled about which image actually runs.

Extended reasoning...

What's stale

Lines 2633-2635 currently read:

# MTP variant of dsv4-fp8-h200-vllm. Uses the canonical v0.20.1 image # (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds # --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.

This PR changes line 2637 from vllm/vllm-openai:v0.20.1@sha256:9eff9734... (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. The comment immediately above is left untouched.

Why both halves of the comment are now wrong

"Uses the canonical v0.20.1 image" — no longer true. The entry now runs a vLLM nightly release-repo build pinned by Git commit SHA (fbd51e3dfc902364fddab316ef1337c4f261de1a), not the v0.20.1 release.

"the non-MTP entry above is still on the deepseekv4-cu129 tag" — this parenthetical was meant to highlight a contrast: MTP=v0.20.1/cu130 vs non-MTP=cu129. With this PR, both entries are now on cu129, so the contrast no longer makes sense and the parenthetical reads as a non-sequitur.

Step-by-step proof

Before this PR: dsv4-fp8-h200-vllm-mtp.image = vllm/vllm-openai:v0.20.1@sha256:9eff9734... (built against cu130). The comment's claim "canonical v0.20.1 image" matched, and the parenthetical contrast against the cu129 non-MTP entry made sense.

This PR: line 2637 is changed in-place to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. Lines 2633-2635 are not touched (the diff window starts at line 2634, so the comment edit was missed).

After this PR: a reader scanning the YAML sees the comment "Uses the canonical v0.20.1 image" and will reasonably believe vllm/vllm-openai:v0.20.1 is what runs, but the actual image one line below is a vLLM nightly cu129 release-repo build. The contrast in the parenthetical is also wrong because the non-MTP entry above is also on cu129 — so MTP and non-MTP are no longer differentiated by cu version.

Impact

Documentation drift / no runtime impact — the workflow will pull and run the new image regardless of the comment. But the comment is directly above the line this PR edits and is now actively misleading. perf-changelog.yaml already records the correct change, so the in-file YAML comment is the only piece left out of sync.

Suggested fix

Update lines 2633-2635 to describe the new image, e.g.:

# MTP variant of dsv4-fp8-h200-vllm. Uses a vLLM nightly release-repo # build (cu129, pinned by Git commit SHA) and adds # --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.

Severity: nit — comment-only staleness, no runtime effect, but it was introduced (uncorrected) by this PR's image change and is a one-line edit to keep the YAML self-consistent.

github-actions · 2026-05-05T18:51:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392636756
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25392636756

github-actions · 2026-05-05T19:22:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392636756
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25392636756

functionstackx requested a review from a team May 5, 2026 17:44

functionstackx requested review from jgangani and kedarpotdar-nv as code owners May 5, 2026 17:44

github-project-automation Bot added this to InferenceMAX Board May 5, 2026

perf-changelog: fill in PR #1284 link

d12c52e

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx added the full-sweep-enabled label May 5, 2026

claude Bot reviewed May 5, 2026

View reviewed changes

functionstackx changed the title ~~dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image~~ [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image May 5, 2026

functionstackx changed the title ~~[testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image~~ [DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image#1284

[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image#1284
functionstackx wants to merge 3 commits intomainfrom
claude/dsv4-h200-mtp-vllm-release-repo-image

functionstackx commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 5, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests / proof

Why existing code does not prevent it

Impact

Fix

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant