[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image#1284
Conversation
Move from vllm/vllm-openai:v0.20.1 (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build on cu129). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Enroot expects the registry and path to be separated by '#' rather than '/' in the image URL. Other entries in nvidia-master.yaml already use this form (nvcr.io#..., ghcr.io#...). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392580021 |
| - config-keys: | ||
| - dsv4-fp8-h200-vllm-mtp | ||
| description: | ||
| - "Switch image from vllm/vllm-openai:v0.20.1 (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build, cu129)" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284 |
There was a problem hiding this comment.
🔴 The new perf-changelog.yaml entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER instead of the real PR number — the placeholder substitution wasn't done before push. The link will 404, breaking traceability from this changelog row to the originating PR. Fix by replacing PLACEHOLDER with 1284.
Extended reasoning...
What the bug is
perf-changelog.yaml line 2222 (the pr-link for the new dsv4-fp8-h200-vllm-mtp image-switch entry added by this PR) reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDERThe literal token PLACEHOLDER was committed instead of the real PR number 1284.
How it manifests / proof
Step-by-step verification against the tip of this branch (commit e7db559):
git show e7db559 -- perf-changelog.yamlshows the diff hunk ending withpr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER— i.e. the committed bytes contain the literal stringPLACEHOLDER, not1284.- Reading the working-tree file at line 2222 confirms the same:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER. - All other
pr-linkentries in the same file (e.g. line 2189:pull/1256, line 2199:pull/1065, line 2209:pull/1222, line 2216:pull/1279) use real numeric PR ids, so the format and convention are unambiguous. - Browsing
https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDERreturns 404 (no such PR), so the link is dead on arrival.
Note: the rendered PR-diff block in some review tooling may post-process and display pull/1284, but the actual committed file content (the source of truth that downstream tooling and humans will read) contains PLACEHOLDER. Both this branch HEAD and the working tree confirm PLACEHOLDER.
Why existing code does not prevent it
There is no schema validation on pr-link values in this repo — the YAML is a plain document, so a bogus URL passes through silently. The author or a templating step (commonly a sed substitution that replaces PLACEHOLDER after the PR number is known) clearly intended to fill this in but missed it before pushing.
Impact
- The changelog
pr-linkis the single field that ties a changelog row back to the PR that introduced the behavior change. WithPLACEHOLDER, that traceability is broken for this entry only — humans following the link land on a 404; any tooling that resolvespr-linkto fetch PR context (release notes, dashboards, automated changelog renderers) will fail or skip this row. - It is a documentation/metadata defect rather than a runtime defect, so it does not break the actual sweep. But it permanently corrupts the changelog history if merged as-is.
Fix
One-line edit on line 2222:
- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER
+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284| # (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds | ||
| # --speculative-config '{"method":"mtp","num_speculative_tokens":2}'. | ||
| dsv4-fp8-h200-vllm-mtp: | ||
| image: vllm/vllm-openai:v0.20.1@sha256:9eff9734a30b6713a8566217d36f8277630fd2d31cec7f0a0292835901a23aa4 | ||
| image: public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 |
There was a problem hiding this comment.
🟡 Stale comment at lines 2633-2635 still claims the entry "Uses the canonical v0.20.1 image (the non-MTP entry above is still on the deepseekv4-cu129 tag)", but this PR switches the image on line 2637 to a vllm-release-repo nightly cu129 build pinned by SHA. Both halves of the comment are now factually wrong. Update the comment alongside the image bump so future readers aren't misled about which image actually runs.
Extended reasoning...
What's stale
Lines 2633-2635 currently read:
# MTP variant of dsv4-fp8-h200-vllm. Uses the canonical v0.20.1 image
# (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.This PR changes line 2637 from vllm/vllm-openai:v0.20.1@sha256:9eff9734... (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. The comment immediately above is left untouched.
Why both halves of the comment are now wrong
- "Uses the canonical v0.20.1 image" — no longer true. The entry now runs a vLLM nightly release-repo build pinned by Git commit SHA (
fbd51e3dfc902364fddab316ef1337c4f261de1a), not the v0.20.1 release. - "the non-MTP entry above is still on the deepseekv4-cu129 tag" — this parenthetical was meant to highlight a contrast: MTP=v0.20.1/cu130 vs non-MTP=cu129. With this PR, both entries are now on cu129, so the contrast no longer makes sense and the parenthetical reads as a non-sequitur.
Step-by-step proof
- Before this PR:
dsv4-fp8-h200-vllm-mtp.image=vllm/vllm-openai:v0.20.1@sha256:9eff9734...(built against cu130). The comment's claim "canonical v0.20.1 image" matched, and the parenthetical contrast against the cu129 non-MTP entry made sense. - This PR: line 2637 is changed in-place to
public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. Lines 2633-2635 are not touched (the diff window starts at line 2634, so the comment edit was missed). - After this PR: a reader scanning the YAML sees the comment "Uses the canonical v0.20.1 image" and will reasonably believe
vllm/vllm-openai:v0.20.1is what runs, but the actual image one line below is a vLLM nightly cu129 release-repo build. The contrast in the parenthetical is also wrong because the non-MTP entry above is also on cu129 — so MTP and non-MTP are no longer differentiated by cu version.
Impact
Documentation drift / no runtime impact — the workflow will pull and run the new image regardless of the comment. But the comment is directly above the line this PR edits and is now actively misleading. perf-changelog.yaml already records the correct change, so the in-file YAML comment is the only piece left out of sync.
Suggested fix
Update lines 2633-2635 to describe the new image, e.g.:
# MTP variant of dsv4-fp8-h200-vllm. Uses a vLLM nightly release-repo
# build (cu129, pinned by Git commit SHA) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.Severity: nit — comment-only staleness, no runtime effect, but it was introduced (uncorrected) by this PR's image change and is a one-line edit to keep the YAML self-consistent.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392636756 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25392636756 |
Summary
Switch the
dsv4-fp8-h200-vllm-mtpimage fromvllm/vllm-openai:v0.20.1@sha256:9eff9734...(cu130) topublic.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129(vLLM nightly release-repo build pinned by SHA, cu129)..github/configs/nvidia-master.yamlunderdsv4-fp8-h200-vllm-mtp.perf-changelog.yamlentry to trigger the new sweep.Test plan
dsv4-fp8-h200-vllm-mtpworkflow on an H200 runner and confirm the new image pulls cleanly.seq-len-configs.vllm-openai:v0.20.1numbers.🤖 Generated with Claude Code