[https://nvbugs/6109745][fix] Use ignore_eos=True to prevent empty outputs from EOS sensitivity, replace exact#13678
Conversation
…al noise The flashinfer upgrade (0.6.6 -> 0.6.9) changed numerical behavior of norm/activation kernels, causing greedy decoding to produce different first tokens between TP=1 and TP=2 for this model/LoRA combination. Fix the test by: 1. Using ignore_eos=True to prevent empty output when EOS is predicted as first token due to numerical sensitivity at the EOS boundary 2. Replacing exact equality assertion with similarity-based comparison that accounts for greedy decoding cascade (once one token differs, all subsequent tokens diverge) 3. Removing the test waiver since the test now passes Signed-off-by: svc-repair-bot <svc-repair-bot@nvidia.com> Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
📝 WalkthroughWalkthroughThe changes modify test infrastructure by removing a waive directive from a test skip list and relaxing validation logic in a LoRA test utility from strict output matching to similarity-based assertions with adjusted generation parameters. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unittest/llmapi/lora_test_utils.py (1)
62-89: QA test-list update checkThis is a unittest utility behavior adjustment, so integration QA list updates under
tests/integration/test_lists/qa/are unnecessary for this PR.As per coding guidelines: “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/llmapi/lora_test_utils.py` around lines 62 - 89, This PR only changes a unittest utility (check_phi3_lora_fused_modules_output_tp2_identical_to_tp1) and therefore does not require updates to the integration QA lists under tests/integration/test_lists/qa/; please add a short note either to the PR description or as a one-line comment near the test utility stating "No QA list updates required for unittest-only changes" so reviewers know QA list updates are unnecessary per the coding guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittest/llmapi/lora_test_utils.py`:
- Around line 76-85: The test currently uses zip(outputs_tp1, outputs_tp2) which
silently drops any unmatched items; before computing similarity_score over
pairs, add an explicit check that the two lists have equal length (e.g., assert
len(outputs_tp1) == len(outputs_tp2), with a clear failure message referencing
TP=1 vs TP=2), or replace zip(...) with itertools.zip_longest and assert no None
values to fail when lengths differ; update the block that computes scores (the
variables outputs_tp1, outputs_tp2 and the call to similarity_score) to rely on
this strict pairing so missing outputs cannot be silently ignored.
---
Nitpick comments:
In `@tests/unittest/llmapi/lora_test_utils.py`:
- Around line 62-89: This PR only changes a unittest utility
(check_phi3_lora_fused_modules_output_tp2_identical_to_tp1) and therefore does
not require updates to the integration QA lists under
tests/integration/test_lists/qa/; please add a short note either to the PR
description or as a one-line comment near the test utility stating "No QA list
updates required for unittest-only changes" so reviewers know QA list updates
are unnecessary per the coding guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2a8c60f1-2ed8-426c-a66b-b9b5956ead4b
📒 Files selected for processing (2)
tests/integration/test_lists/waives.txttests/unittest/llmapi/lora_test_utils.py
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
| for i, (out_tp1, out_tp2) in enumerate(zip(outputs_tp1, outputs_tp2)): | ||
| assert out_tp1, f"Prompt {i}: TP=1 produced empty output" | ||
| assert out_tp2, f"Prompt {i}: TP=2 produced empty output" | ||
| # Verify outputs are not completely unrelated by checking at least one | ||
| # prompt pair has meaningful overlap. Greedy decoding amplifies numerical | ||
| # differences from TP splitting, so individual prompts may diverge. | ||
| scores = [ | ||
| similarity_score(out_tp1, out_tp2) | ||
| for out_tp1, out_tp2 in zip(outputs_tp1, outputs_tp2) | ||
| ] |
There was a problem hiding this comment.
Prevent silent truncation when comparing TP outputs
Line 76 and Line 84 use zip(...) without guarding equal lengths, so a TP path returning fewer outputs can be silently ignored and the test may still pass. Please make the pairing strict (or assert equal lengths first).
Suggested patch
- for i, (out_tp1, out_tp2) in enumerate(zip(outputs_tp1, outputs_tp2)):
+ for i, (out_tp1, out_tp2) in enumerate(
+ zip(outputs_tp1, outputs_tp2, strict=True)):
assert out_tp1, f"Prompt {i}: TP=1 produced empty output"
assert out_tp2, f"Prompt {i}: TP=2 produced empty output"
@@
scores = [
similarity_score(out_tp1, out_tp2)
- for out_tp1, out_tp2 in zip(outputs_tp1, outputs_tp2)
+ for out_tp1, out_tp2 in zip(outputs_tp1, outputs_tp2, strict=True)
]🧰 Tools
🪛 Ruff (0.15.12)
[warning] 76-76: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 84-84: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittest/llmapi/lora_test_utils.py` around lines 76 - 85, The test
currently uses zip(outputs_tp1, outputs_tp2) which silently drops any unmatched
items; before computing similarity_score over pairs, add an explicit check that
the two lists have equal length (e.g., assert len(outputs_tp1) ==
len(outputs_tp2), with a clear failure message referencing TP=1 vs TP=2), or
replace zip(...) with itertools.zip_longest and assert no None values to fail
when lengths differ; update the block that computes scores (the variables
outputs_tp1, outputs_tp2 and the call to similarity_score) to rely on this
strict pairing so missing outputs cannot be silently ignored.
Summary
Test plan
Links
Summary by CodeRabbit