Skip to content

[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actuall#13701

Open
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6120535
Open

[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actuall#13701
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6120535

Conversation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

@tensorrt-cicd tensorrt-cicd commented May 2, 2026

Summary

  • Root cause: When a context-only request finishes during prefill (finish_reason='stop'), the C++ executor doesn't populate ctx_request_id in disaggregated_params since no KV cache transfer is needed, but _verify_ctx_response was unconditionally requiring it to be non-None.
  • Fix: Only call _verify_ctx_response and _get_gen_request when generation will actually be needed (finish_reason in "length"/"not_finished"), matching _need_gen's logic without modifying _verify_ctx_response itself (preserving all unit tests).
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Optimized context-phase response handling in disaggregated inference to properly support various completion states and response scenarios.

…st completes

When a context-only request finishes during the context phase
(finish_reason='stop'), the C++ executor does not populate
ctx_request_id in disaggregated_params because no KV cache transfer
is needed. The _verify_ctx_response validation was called
unconditionally, causing a ValueError for these completed requests.

Only verify disaggregated params and prepare the generation request
when generation is actually needed (finish_reason is 'length' or
'not_finished'), matching the _need_gen logic that already handles
this case correctly downstream.

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

📝 Walkthrough

Walkthrough

The PR modifies context-response handling in OpenAIDisaggregatedService._send_disagg_request_ctx_first to conditionally verify and use context responses only when they lack a finish reason or have "length"/"not_finished" reasons, allowing other finish reasons to bypass disaggregated param validation.

Changes

Context Response Handling

Layer / File(s) Summary
Conditional Logic
tensorrt_llm/serve/openai_disagg_service.py
_verify_ctx_response() and gen_req derivation now execute only when ctx_response is absent or finish_reason is "length"/"not_finished", allowing early-finish context responses to return without processing disaggregated parameters.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title is incomplete and cut off mid-sentence, making it unclear and not fully descriptive of the actual change. Complete the title to clearly describe when verification and generation-request calls are made, e.g., '[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actually be needed'.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description includes all required sections (Summary, Test Coverage, Links) and clearly explains the root cause, fix, and testing approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/openai_disagg_service.py`:
- Around line 148-153: The code dereferences ctx_response.choices[0] before
validating structure, risking IndexError; change the logic to validate the
response structure first by calling self._verify_ctx_response(ctx_response) (or
explicitly guard that ctx_response and ctx_response.choices exist and are
non-empty) before accessing choices[0] or checking its finish_reason, then
proceed to call self._get_gen_request(request, ctx_response, disagg_request_id)
only after validation passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61e69492-507f-4b02-a4f2-079e7efbbec9

📥 Commits

Reviewing files that changed from the base of the PR and between ce1eecc and 09fb8d2.

📒 Files selected for processing (1)
  • tensorrt_llm/serve/openai_disagg_service.py

Comment on lines +148 to +153
if not ctx_response or ctx_response.choices[0].finish_reason in (
"length",
"not_finished",
):
await self._verify_ctx_response(ctx_response)
gen_req = self._get_gen_request(request, ctx_response, disagg_request_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard choices[0] access before finish-reason check.

Line 148 dereferences ctx_response.choices[0] before structural validation. If the ctx payload is malformed (e.g., zero choices), this throws IndexError instead of the controlled ValueError from _verify_ctx_response.

Proposed fix
             ctx_response = await self._ctx_client.send_request(
                 ctx_req, server=ctx_server, hooks=hooks
             )
-            if not ctx_response or ctx_response.choices[0].finish_reason in (
+            if ctx_response and len(ctx_response.choices) != 1:
+                await self._verify_ctx_response(ctx_response)
+
+            if not ctx_response or ctx_response.choices[0].finish_reason in (
                 "length",
                 "not_finished",
             ):
                 await self._verify_ctx_response(ctx_response)
                 gen_req = self._get_gen_request(request, ctx_response, disagg_request_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_disagg_service.py` around lines 148 - 153, The code
dereferences ctx_response.choices[0] before validating structure, risking
IndexError; change the logic to validate the response structure first by calling
self._verify_ctx_response(ctx_response) (or explicitly guard that ctx_response
and ctx_response.choices exist and are non-empty) before accessing choices[0] or
checking its finish_reason, then proceed to call self._get_gen_request(request,
ctx_response, disagg_request_id) only after validation passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants