[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser#13684
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/llmapi/reasoning_parser.py`:
- Line 299: This file is missing the required NVIDIA copyright + Apache 2.0
header (or needs an updated year); open tensorrt_llm/llmapi/reasoning_parser.py
(you can locate it using the occurrence of self._tool_call_start =
"<tool_call>") and add the standard NVIDIA copyright header with the correct
latest modification year and Apache 2.0 license notice at the very top of the
file, or update the year in the existing header if present.
- Around line 327-341: When in_reasoning and combining self._buffer +
delta_text, detect if there's no full _tool_call_start (tool_idx == -1) but the
end of the combined string contains a partial prefix of self._tool_call_start;
extract and keep that partial suffix in self._buffer (so it isn't emitted as
reasoning_content) and treat the rest as reasoning content (or continue
buffering) before delegating to DeepSeekR1Parser.parse_delta(); update the logic
in the in_reasoning branch of the parser (the block that uses self._buffer,
_tool_call_start, and returns ReasoningParserResult) to compute the longest
suffix of combined that is a prefix of self._tool_call_start, store that suffix
in self._buffer, ensure in_reasoning stays correct, and then return/continue so
the downstream tool parser can receive a contiguous "<tool_call>" sequence; also
add a regression test that feeds split-tag sequences like ["reasoning<tool",
"_call>data"] and ["reasoning<", "tool", "_call>data"] to verify the contiguous
tag reaches the tool parser.
In `@tests/unittest/llmapi/test_reasoning_parser.py`:
- Around line 236-238: Add or update the NVIDIA Apache-2.0 copyright header at
the top of the test_reasoning_parser.py file (the file containing TOOL_CALL and
TOOL_CALL_END) to include the NVIDIA copyright line with the current year and
the full Apache 2.0 license notice per project guidelines; ensure the header
format matches other .py files in the repo and update the year to the latest
modification year.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8187a810-bfaa-486d-9e02-3930c8a50e0f
📒 Files selected for processing (2)
tensorrt_llm/llmapi/reasoning_parser.pytests/unittest/llmapi/test_reasoning_parser.py
| "force_nonempty_content", False) is True | ||
| super().__init__(reasoning_at_start=reasoning_at_start, | ||
| chat_template_kwargs=chat_template_kwargs) | ||
| self._tool_call_start = "<tool_call>" |
There was a problem hiding this comment.
Add the required NVIDIA Apache header/year update.
This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.
As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tensorrt_llm/llmapi/reasoning_parser.py` at line 299, This file is missing
the required NVIDIA copyright + Apache 2.0 header (or needs an updated year);
open tensorrt_llm/llmapi/reasoning_parser.py (you can locate it using the
occurrence of self._tool_call_start = "<tool_call>") and add the standard NVIDIA
copyright header with the correct latest modification year and Apache 2.0
license notice at the very top of the file, or update the year in the existing
header if present.
| if self.in_reasoning: | ||
| combined = self._buffer + delta_text | ||
| tool_idx = combined.find(self._tool_call_start) | ||
| if tool_idx != -1: | ||
| end_idx = combined.find(self.reasoning_end) | ||
| if end_idx == -1 or tool_idx < end_idx: | ||
| reasoning = combined[:tool_idx] | ||
| content = combined[tool_idx:] | ||
| self._buffer = "" | ||
| self.in_reasoning = False | ||
| if self._force_nonempty_content: | ||
| self._found_closing_tag = True | ||
| self._accumulated_reasoning = "" | ||
| return ReasoningParserResult(content=content, | ||
| reasoning_content=reasoning) |
There was a problem hiding this comment.
Buffer partial <tool_call> prefixes before delegating to the parent parser.
If the stream arrives as ["reasoning<tool", "_call>data"] or ["reasoning<", "tool", "_call>data"], tool_idx stays -1 here and control falls through to DeepSeekR1Parser.parse_delta(), which only buffers prefixes of </think>. That still leaks the partial tool tag into reasoning_content, so the downstream tool parser never sees a contiguous <tool_call>.
Suggested fix
if self.in_reasoning:
combined = self._buffer + delta_text
tool_idx = combined.find(self._tool_call_start)
if tool_idx != -1:
end_idx = combined.find(self.reasoning_end)
if end_idx == -1 or tool_idx < end_idx:
reasoning = combined[:tool_idx]
content = combined[tool_idx:]
self._buffer = ""
self.in_reasoning = False
if self._force_nonempty_content:
self._found_closing_tag = True
self._accumulated_reasoning = ""
return ReasoningParserResult(content=content,
reasoning_content=reasoning)
+
+ last_lt = combined.rfind("<")
+ if last_lt != -1 and self._tool_call_start.startswith(
+ combined[last_lt:]):
+ self._buffer = combined[last_lt:]
+ return ReasoningParserResult(
+ reasoning_content=combined[:last_lt])Please add a regression for that split-tag shape too.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.in_reasoning: | |
| combined = self._buffer + delta_text | |
| tool_idx = combined.find(self._tool_call_start) | |
| if tool_idx != -1: | |
| end_idx = combined.find(self.reasoning_end) | |
| if end_idx == -1 or tool_idx < end_idx: | |
| reasoning = combined[:tool_idx] | |
| content = combined[tool_idx:] | |
| self._buffer = "" | |
| self.in_reasoning = False | |
| if self._force_nonempty_content: | |
| self._found_closing_tag = True | |
| self._accumulated_reasoning = "" | |
| return ReasoningParserResult(content=content, | |
| reasoning_content=reasoning) | |
| if self.in_reasoning: | |
| combined = self._buffer + delta_text | |
| tool_idx = combined.find(self._tool_call_start) | |
| if tool_idx != -1: | |
| end_idx = combined.find(self.reasoning_end) | |
| if end_idx == -1 or tool_idx < end_idx: | |
| reasoning = combined[:tool_idx] | |
| content = combined[tool_idx:] | |
| self._buffer = "" | |
| self.in_reasoning = False | |
| if self._force_nonempty_content: | |
| self._found_closing_tag = True | |
| self._accumulated_reasoning = "" | |
| return ReasoningParserResult(content=content, | |
| reasoning_content=reasoning) | |
| last_lt = combined.rfind("<") | |
| if last_lt != -1 and self._tool_call_start.startswith( | |
| combined[last_lt:]): | |
| self._buffer = combined[last_lt:] | |
| return ReasoningParserResult( | |
| reasoning_content=combined[:last_lt]) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tensorrt_llm/llmapi/reasoning_parser.py` around lines 327 - 341, When
in_reasoning and combining self._buffer + delta_text, detect if there's no full
_tool_call_start (tool_idx == -1) but the end of the combined string contains a
partial prefix of self._tool_call_start; extract and keep that partial suffix in
self._buffer (so it isn't emitted as reasoning_content) and treat the rest as
reasoning content (or continue buffering) before delegating to
DeepSeekR1Parser.parse_delta(); update the logic in the in_reasoning branch of
the parser (the block that uses self._buffer, _tool_call_start, and returns
ReasoningParserResult) to compute the longest suffix of combined that is a
prefix of self._tool_call_start, store that suffix in self._buffer, ensure
in_reasoning stays correct, and then return/continue so the downstream tool
parser can receive a contiguous "<tool_call>" sequence; also add a regression
test that feeds split-tag sequences like ["reasoning<tool", "_call>data"] and
["reasoning<", "tool", "_call>data"] to verify the contiguous tag reaches the
tool parser.
| TOOL_CALL = "<tool_call>" | ||
| TOOL_CALL_END = "</tool_call>" | ||
|
|
There was a problem hiding this comment.
Add the required NVIDIA Apache header/year update.
This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.
As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittest/llmapi/test_reasoning_parser.py` around lines 236 - 238, Add
or update the NVIDIA Apache-2.0 copyright header at the top of the
test_reasoning_parser.py file (the file containing TOOL_CALL and TOOL_CALL_END)
to include the NVIDIA copyright line with the current year and the full Apache
2.0 license notice per project guidelines; ensure the header format matches
other .py files in the repo and update the year to the latest modification year.
304b614 to
265f3d5
Compare
…asoning in nano-v3 parser When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model occasionally omits </think> before generating <tool_call>. The parser stayed in reasoning mode and absorbed the tool call markup into reasoning_content, causing the downstream tool parser to never see it (~2-8% of streaming tool-call requests silently failed). Treat <tool_call> as an implicit </think> in NemotronV3ReasoningParser, following the same pattern as KimiK2ReasoningParser which treats <|tool_calls_section_begin|> as an implicit reasoning end. Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor
265f3d5 to
1600913
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #46454 [ run ] triggered by Bot. Commit: |
| force_nonempty_content is set. When the closing tag is found | ||
| (in_reasoning transitions from True to False), the accumulation | ||
| is cleared to free memory.""" | ||
| """Wraps the parent parse_delta to also treat ``<tool_call>`` as an |
| assert result.content == content[i] | ||
| assert result.reasoning_content == reasoning_context[i] | ||
| assert result.content == content[i], \ | ||
| f"Step {i}: delta={delta_text!r}, expected content={content[i]!r}, got {result.content!r}" |
There was a problem hiding this comment.
Nit: the debug strings are unnecessary as pytest will print them anyway upon failure (it overrides assert).
| remaining = self._buffer | ||
| self._buffer = "" | ||
| self.in_reasoning = False | ||
| tool_idx = delta_text.find(self._tool_call_start) |
There was a problem hiding this comment.
Nit: maybe leave a comment that this is for sure non-negative, since we checked the existence of self._tool_call_start at line 346?
When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model occasionally omits before generating <tool_call>. The parser stayed in reasoning mode and absorbed the tool call markup into reasoning_content, causing the downstream tool parser to never see it (~2-8% of streaming tool-call requests silently failed).
Treat <tool_call> as an implicit in NemotronV3ReasoningParser, following the same pattern as KimiK2ReasoningParser which treats <|tool_calls_section_begin|> as an implicit reasoning end.
Made-with: Cursor
Summary by CodeRabbit
New Features
<tool_call>tags as an implicit end-of-reasoning boundary, improving handling of tool invocations within reasoning blocks in both streaming and non-streaming modes.Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.