[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actuall by tensorrt-cicd · Pull Request #13701 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-05-02T04:37:26Z

Summary

Root cause: When a context-only request finishes during prefill (finish_reason='stop'), the C++ executor doesn't populate ctx_request_id in disaggregated_params since no KV cache transfer is needed, but _verify_ctx_response was unconditionally requiring it to be non-None.
Fix: Only call _verify_ctx_response and _get_gen_request when generation will actually be needed (finish_reason in "length"/"not_finished"), matching _need_gen's logic without modifying _verify_ctx_response itself (preserving all unit tests).
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6120535

Summary by CodeRabbit

Release Notes

Bug Fixes
- Optimized context-phase response handling in disaggregated inference to properly support various completion states and response scenarios.

…st completes When a context-only request finishes during the context phase (finish_reason='stop'), the C++ executor does not populate ctx_request_id in disaggregated_params because no KV cache transfer is needed. The _verify_ctx_response validation was called unconditionally, causing a ValueError for these completed requests. Only verify disaggregated params and prepare the generation request when generation is actually needed (finish_reason is 'length' or 'not_finished'), matching the _need_gen logic that already handles this case correctly downstream. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

coderabbitai · 2026-05-02T04:40:52Z

📝 Walkthrough

Walkthrough

The PR modifies context-response handling in OpenAIDisaggregatedService._send_disagg_request_ctx_first to conditionally verify and use context responses only when they lack a finish reason or have "length"/"not_finished" reasons, allowing other finish reasons to bypass disaggregated param validation.

Changes

Context Response Handling

Layer / File(s)	Summary
Conditional Logic `tensorrt_llm/serve/openai_disagg_service.py`	`_verify_ctx_response()` and `gen_req` derivation now execute only when `ctx_response` is absent or `finish_reason` is `"length"`/`"not_finished"`, allowing early-finish context responses to return without processing disaggregated parameters.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title is incomplete and cut off mid-sentence, making it unclear and not fully descriptive of the actual change.	Complete the title to clearly describe when verification and generation-request calls are made, e.g., '[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actually be needed'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description includes all required sections (Summary, Test Coverage, Links) and clearly explains the root cause, fix, and testing approach.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/openai_disagg_service.py`:
- Around line 148-153: The code dereferences ctx_response.choices[0] before
validating structure, risking IndexError; change the logic to validate the
response structure first by calling self._verify_ctx_response(ctx_response) (or
explicitly guard that ctx_response and ctx_response.choices exist and are
non-empty) before accessing choices[0] or checking its finish_reason, then
proceed to call self._get_gen_request(request, ctx_response, disagg_request_id)
only after validation passes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61e69492-507f-4b02-a4f2-079e7efbbec9

📥 Commits

Reviewing files that changed from the base of the PR and between ce1eecc and 09fb8d2.

📒 Files selected for processing (1)

tensorrt_llm/serve/openai_disagg_service.py

coderabbitai · 2026-05-02T04:40:55Z

+            if not ctx_response or ctx_response.choices[0].finish_reason in (
+                "length",
+                "not_finished",
+            ):
+                await self._verify_ctx_response(ctx_response)
+                gen_req = self._get_gen_request(request, ctx_response, disagg_request_id)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard choices[0] access before finish-reason check.

Line 148 dereferences ctx_response.choices[0] before structural validation. If the ctx payload is malformed (e.g., zero choices), this throws IndexError instead of the controlled ValueError from _verify_ctx_response.

Proposed fix

ctx_response = await self._ctx_client.send_request( ctx_req, server=ctx_server, hooks=hooks ) - if not ctx_response or ctx_response.choices[0].finish_reason in ( + if ctx_response and len(ctx_response.choices) != 1: + await self._verify_ctx_response(ctx_response) + + if not ctx_response or ctx_response.choices[0].finish_reason in ( "length", "not_finished", ): await self._verify_ctx_response(ctx_response) gen_req = self._get_gen_request(request, ctx_response, disagg_request_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/openai_disagg_service.py` around lines 148 - 153, The code dereferences ctx_response.choices[0] before validating structure, risking IndexError; change the logic to validate the response structure first by calling self._verify_ctx_response(ctx_response) (or explicitly guard that ctx_response and ctx_response.choices exist and are non-empty) before accessing choices[0] or checking its finish_reason, then proceed to call self._get_gen_request(request, ctx_response, disagg_request_id) only after validation passes.

tensorrt-cicd requested a review from a team as a code owner May 2, 2026 04:37

tensorrt-cicd assigned wenmingw May 2, 2026

tensorrt-cicd requested a review from zhenhuaw-me May 2, 2026 04:37

github-actions Bot assigned tensorrt-cicd May 2, 2026

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actuall#13701

[https://nvbugs/6120535][fix] Only call _verify_ctx_response and _get_gen_request when generation will actuall#13701
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6120535

tensorrt-cicd commented May 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tensorrt-cicd commented May 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tensorrt-cicd commented May 2, 2026 •

edited by coderabbitai Bot

Loading