Skip to content

[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image#1284

Open
functionstackx wants to merge 3 commits intomainfrom
claude/dsv4-h200-mtp-vllm-release-repo-image
Open

[DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image#1284
functionstackx wants to merge 3 commits intomainfrom
claude/dsv4-h200-mtp-vllm-release-repo-image

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Summary

Switch the dsv4-fp8-h200-vllm-mtp image from vllm/vllm-openai:v0.20.1@sha256:9eff9734... (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build pinned by SHA, cu129).

  • Single change to .github/configs/nvidia-master.yaml under dsv4-fp8-h200-vllm-mtp.
  • Adds a perf-changelog.yaml entry to trigger the new sweep.

Test plan

  • Trigger the dsv4-fp8-h200-vllm-mtp workflow on an H200 runner and confirm the new image pulls cleanly.
  • Engine starts and the sweep completes for at least one cell from each of the two seq-len-configs.
  • Acceptance rate / throughput at matched concurrency points are at least on par with the prior vllm-openai:v0.20.1 numbers.

🤖 Generated with Claude Code

Move from vllm/vllm-openai:v0.20.1 (cu130) to
public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129
(vLLM nightly release-repo build on cu129).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Enroot expects the registry and path to be separated by '#' rather than '/'
in the image URL. Other entries in nvidia-master.yaml already use this form
(nvcr.io#..., ghcr.io#...).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Comment thread perf-changelog.yaml
Comment on lines +2218 to +2222
- config-keys:
- dsv4-fp8-h200-vllm-mtp
description:
- "Switch image from vllm/vllm-openai:v0.20.1 (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129 (vLLM nightly release-repo build, cu129)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new perf-changelog.yaml entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER instead of the real PR number — the placeholder substitution wasn't done before push. The link will 404, breaking traceability from this changelog row to the originating PR. Fix by replacing PLACEHOLDER with 1284.

Extended reasoning...

What the bug is

perf-changelog.yaml line 2222 (the pr-link for the new dsv4-fp8-h200-vllm-mtp image-switch entry added by this PR) reads:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER

The literal token PLACEHOLDER was committed instead of the real PR number 1284.

How it manifests / proof

Step-by-step verification against the tip of this branch (commit e7db559):

  1. git show e7db559 -- perf-changelog.yaml shows the diff hunk ending with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — i.e. the committed bytes contain the literal string PLACEHOLDER, not 1284.
  2. Reading the working-tree file at line 2222 confirms the same: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER.
  3. All other pr-link entries in the same file (e.g. line 2189: pull/1256, line 2199: pull/1065, line 2209: pull/1222, line 2216: pull/1279) use real numeric PR ids, so the format and convention are unambiguous.
  4. Browsing https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER returns 404 (no such PR), so the link is dead on arrival.

Note: the rendered PR-diff block in some review tooling may post-process and display pull/1284, but the actual committed file content (the source of truth that downstream tooling and humans will read) contains PLACEHOLDER. Both this branch HEAD and the working tree confirm PLACEHOLDER.

Why existing code does not prevent it

There is no schema validation on pr-link values in this repo — the YAML is a plain document, so a bogus URL passes through silently. The author or a templating step (commonly a sed substitution that replaces PLACEHOLDER after the PR number is known) clearly intended to fill this in but missed it before pushing.

Impact

  • The changelog pr-link is the single field that ties a changelog row back to the PR that introduced the behavior change. With PLACEHOLDER, that traceability is broken for this entry only — humans following the link land on a 404; any tooling that resolves pr-link to fetch PR context (release notes, dashboards, automated changelog renderers) will fail or skip this row.
  • It is a documentation/metadata defect rather than a runtime defect, so it does not break the actual sweep. But it permanently corrupts the changelog history if merged as-is.

Fix

One-line edit on line 2222:

-  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1284

Comment thread .github/configs/nvidia-master.yaml Outdated
Comment on lines +2634 to +2637
# (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.
dsv4-fp8-h200-vllm-mtp:
image: vllm/vllm-openai:v0.20.1@sha256:9eff9734a30b6713a8566217d36f8277630fd2d31cec7f0a0292835901a23aa4
image: public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale comment at lines 2633-2635 still claims the entry "Uses the canonical v0.20.1 image (the non-MTP entry above is still on the deepseekv4-cu129 tag)", but this PR switches the image on line 2637 to a vllm-release-repo nightly cu129 build pinned by SHA. Both halves of the comment are now factually wrong. Update the comment alongside the image bump so future readers aren't misled about which image actually runs.

Extended reasoning...

What's stale

Lines 2633-2635 currently read:

# MTP variant of dsv4-fp8-h200-vllm. Uses the canonical v0.20.1 image
# (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.

This PR changes line 2637 from vllm/vllm-openai:v0.20.1@sha256:9eff9734... (cu130) to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. The comment immediately above is left untouched.

Why both halves of the comment are now wrong

  1. "Uses the canonical v0.20.1 image" — no longer true. The entry now runs a vLLM nightly release-repo build pinned by Git commit SHA (fbd51e3dfc902364fddab316ef1337c4f261de1a), not the v0.20.1 release.
  2. "the non-MTP entry above is still on the deepseekv4-cu129 tag" — this parenthetical was meant to highlight a contrast: MTP=v0.20.1/cu130 vs non-MTP=cu129. With this PR, both entries are now on cu129, so the contrast no longer makes sense and the parenthetical reads as a non-sequitur.

Step-by-step proof

  • Before this PR: dsv4-fp8-h200-vllm-mtp.image = vllm/vllm-openai:v0.20.1@sha256:9eff9734... (built against cu130). The comment's claim "canonical v0.20.1 image" matched, and the parenthetical contrast against the cu129 non-MTP entry made sense.
  • This PR: line 2637 is changed in-place to public.ecr.aws/q9t5s3a7/vllm-release-repo:fbd51e3dfc902364fddab316ef1337c4f261de1a-x86_64-cu129. Lines 2633-2635 are not touched (the diff window starts at line 2634, so the comment edit was missed).
  • After this PR: a reader scanning the YAML sees the comment "Uses the canonical v0.20.1 image" and will reasonably believe vllm/vllm-openai:v0.20.1 is what runs, but the actual image one line below is a vLLM nightly cu129 release-repo build. The contrast in the parenthetical is also wrong because the non-MTP entry above is also on cu129 — so MTP and non-MTP are no longer differentiated by cu version.

Impact

Documentation drift / no runtime impact — the workflow will pull and run the new image regardless of the comment. But the comment is directly above the line this PR edits and is now actively misleading. perf-changelog.yaml already records the correct change, so the in-file YAML comment is the only piece left out of sync.

Suggested fix

Update lines 2633-2635 to describe the new image, e.g.:

# MTP variant of dsv4-fp8-h200-vllm. Uses a vLLM nightly release-repo
# build (cu129, pinned by Git commit SHA) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.

Severity: nit — comment-only staleness, no runtime effect, but it was introduced (uncorrected) by this PR's image change and is a one-line edit to keep the YAML self-consistent.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

@functionstackx functionstackx changed the title dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image May 5, 2026
@functionstackx functionstackx changed the title [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image [DNM] [testing on behalf of inferact for v0.20.2 rc] dsv4-fp8-h200-vllm-mtp: switch to public.ecr.aws vllm-release-repo image May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant