Skip to content

[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser#13684

Open
tijyojwad wants to merge 1 commit intoNVIDIA:mainfrom
tijyojwad:fix/nvbug-6082303-nano-v3-tool-call-absorbed
Open

[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser#13684
tijyojwad wants to merge 1 commit intoNVIDIA:mainfrom
tijyojwad:fix/nvbug-6082303-nano-v3-tool-call-absorbed

Conversation

@tijyojwad
Copy link
Copy Markdown
Collaborator

@tijyojwad tijyojwad commented May 1, 2026

When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model occasionally omits before generating <tool_call>. The parser stayed in reasoning mode and absorbed the tool call markup into reasoning_content, causing the downstream tool parser to never see it (~2-8% of streaming tool-call requests silently failed).

Treat <tool_call> as an implicit in NemotronV3ReasoningParser, following the same pattern as KimiK2ReasoningParser which treats <|tool_calls_section_begin|> as an implicit reasoning end.

Made-with: Cursor

Summary by CodeRabbit

  • New Features

    • Reasoning parser now recognizes <tool_call> tags as an implicit end-of-reasoning boundary, improving handling of tool invocations within reasoning blocks in both streaming and non-streaming modes.
  • Tests

    • Extended test coverage to validate tool call tag handling across various scenarios, including streaming and edge cases.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@tijyojwad tijyojwad requested a review from a team as a code owner May 1, 2026 00:39
@tijyojwad tijyojwad requested a review from mikeiovine May 1, 2026 00:39
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

The NemotronV3ReasoningParser now treats <tool_call> as an implicit end-of-reasoning boundary in both streaming and full-text parsing modes. When detected, reasoning content is truncated at the tag, and text from the tag onward is reclassified as content, with special handling for buffer management and configuration flags.

Changes

Cohort / File(s) Summary
Reasoning Parser Implementation
tensorrt_llm/llmapi/reasoning_parser.py
Added logic in parse_delta and parse methods to recognize and handle <tool_call> as an implicit reasoning boundary. Streaming mode searches buffered+incoming text for the tag; full-text mode reclassifies content post-parse. Integrates with force_nonempty_content flag and buffer clearing.
Parser Test Suite
tests/unittest/llmapi/test_reasoning_parser.py
Expanded test coverage with non-streaming cases for <tool_call> without preceding </think>, parametrized streaming deltas including split tags across increments, and finish() behavior validation. Added per-step error messages for streaming assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description explains the problem and solution clearly, but required template sections (Description, Test Coverage) are incomplete and the PR Checklist sections are left as template placeholders. Fill in the Description and Test Coverage sections with specific details. Ensure the PR Checklist clearly indicates which items have been completed and which are not applicable.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is specific and directly describes the main change: treating <tool_call> as an implicit end-of-reasoning boundary in the nano-v3 parser, with proper ticket reference and type marker.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/llmapi/reasoning_parser.py`:
- Line 299: This file is missing the required NVIDIA copyright + Apache 2.0
header (or needs an updated year); open tensorrt_llm/llmapi/reasoning_parser.py
(you can locate it using the occurrence of self._tool_call_start =
"<tool_call>") and add the standard NVIDIA copyright header with the correct
latest modification year and Apache 2.0 license notice at the very top of the
file, or update the year in the existing header if present.
- Around line 327-341: When in_reasoning and combining self._buffer +
delta_text, detect if there's no full _tool_call_start (tool_idx == -1) but the
end of the combined string contains a partial prefix of self._tool_call_start;
extract and keep that partial suffix in self._buffer (so it isn't emitted as
reasoning_content) and treat the rest as reasoning content (or continue
buffering) before delegating to DeepSeekR1Parser.parse_delta(); update the logic
in the in_reasoning branch of the parser (the block that uses self._buffer,
_tool_call_start, and returns ReasoningParserResult) to compute the longest
suffix of combined that is a prefix of self._tool_call_start, store that suffix
in self._buffer, ensure in_reasoning stays correct, and then return/continue so
the downstream tool parser can receive a contiguous "<tool_call>" sequence; also
add a regression test that feeds split-tag sequences like ["reasoning<tool",
"_call>data"] and ["reasoning<", "tool", "_call>data"] to verify the contiguous
tag reaches the tool parser.

In `@tests/unittest/llmapi/test_reasoning_parser.py`:
- Around line 236-238: Add or update the NVIDIA Apache-2.0 copyright header at
the top of the test_reasoning_parser.py file (the file containing TOOL_CALL and
TOOL_CALL_END) to include the NVIDIA copyright line with the current year and
the full Apache 2.0 license notice per project guidelines; ensure the header
format matches other .py files in the repo and update the year to the latest
modification year.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8187a810-bfaa-486d-9e02-3930c8a50e0f

📥 Commits

Reviewing files that changed from the base of the PR and between 0343a9d and 304b614.

📒 Files selected for processing (2)
  • tensorrt_llm/llmapi/reasoning_parser.py
  • tests/unittest/llmapi/test_reasoning_parser.py

"force_nonempty_content", False) is True
super().__init__(reasoning_at_start=reasoning_at_start,
chat_template_kwargs=chat_template_kwargs)
self._tool_call_start = "<tool_call>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA Apache header/year update.

This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.

As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/llmapi/reasoning_parser.py` at line 299, This file is missing
the required NVIDIA copyright + Apache 2.0 header (or needs an updated year);
open tensorrt_llm/llmapi/reasoning_parser.py (you can locate it using the
occurrence of self._tool_call_start = "<tool_call>") and add the standard NVIDIA
copyright header with the correct latest modification year and Apache 2.0
license notice at the very top of the file, or update the year in the existing
header if present.

Comment thread tensorrt_llm/llmapi/reasoning_parser.py Outdated
Comment on lines +327 to +341
if self.in_reasoning:
combined = self._buffer + delta_text
tool_idx = combined.find(self._tool_call_start)
if tool_idx != -1:
end_idx = combined.find(self.reasoning_end)
if end_idx == -1 or tool_idx < end_idx:
reasoning = combined[:tool_idx]
content = combined[tool_idx:]
self._buffer = ""
self.in_reasoning = False
if self._force_nonempty_content:
self._found_closing_tag = True
self._accumulated_reasoning = ""
return ReasoningParserResult(content=content,
reasoning_content=reasoning)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Buffer partial <tool_call> prefixes before delegating to the parent parser.

If the stream arrives as ["reasoning<tool", "_call>data"] or ["reasoning<", "tool", "_call>data"], tool_idx stays -1 here and control falls through to DeepSeekR1Parser.parse_delta(), which only buffers prefixes of </think>. That still leaks the partial tool tag into reasoning_content, so the downstream tool parser never sees a contiguous <tool_call>.

Suggested fix
         if self.in_reasoning:
             combined = self._buffer + delta_text
             tool_idx = combined.find(self._tool_call_start)
             if tool_idx != -1:
                 end_idx = combined.find(self.reasoning_end)
                 if end_idx == -1 or tool_idx < end_idx:
                     reasoning = combined[:tool_idx]
                     content = combined[tool_idx:]
                     self._buffer = ""
                     self.in_reasoning = False
                     if self._force_nonempty_content:
                         self._found_closing_tag = True
                         self._accumulated_reasoning = ""
                     return ReasoningParserResult(content=content,
                                                  reasoning_content=reasoning)
+
+            last_lt = combined.rfind("<")
+            if last_lt != -1 and self._tool_call_start.startswith(
+                    combined[last_lt:]):
+                self._buffer = combined[last_lt:]
+                return ReasoningParserResult(
+                    reasoning_content=combined[:last_lt])

Please add a regression for that split-tag shape too.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.in_reasoning:
combined = self._buffer + delta_text
tool_idx = combined.find(self._tool_call_start)
if tool_idx != -1:
end_idx = combined.find(self.reasoning_end)
if end_idx == -1 or tool_idx < end_idx:
reasoning = combined[:tool_idx]
content = combined[tool_idx:]
self._buffer = ""
self.in_reasoning = False
if self._force_nonempty_content:
self._found_closing_tag = True
self._accumulated_reasoning = ""
return ReasoningParserResult(content=content,
reasoning_content=reasoning)
if self.in_reasoning:
combined = self._buffer + delta_text
tool_idx = combined.find(self._tool_call_start)
if tool_idx != -1:
end_idx = combined.find(self.reasoning_end)
if end_idx == -1 or tool_idx < end_idx:
reasoning = combined[:tool_idx]
content = combined[tool_idx:]
self._buffer = ""
self.in_reasoning = False
if self._force_nonempty_content:
self._found_closing_tag = True
self._accumulated_reasoning = ""
return ReasoningParserResult(content=content,
reasoning_content=reasoning)
last_lt = combined.rfind("<")
if last_lt != -1 and self._tool_call_start.startswith(
combined[last_lt:]):
self._buffer = combined[last_lt:]
return ReasoningParserResult(
reasoning_content=combined[:last_lt])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/llmapi/reasoning_parser.py` around lines 327 - 341, When
in_reasoning and combining self._buffer + delta_text, detect if there's no full
_tool_call_start (tool_idx == -1) but the end of the combined string contains a
partial prefix of self._tool_call_start; extract and keep that partial suffix in
self._buffer (so it isn't emitted as reasoning_content) and treat the rest as
reasoning content (or continue buffering) before delegating to
DeepSeekR1Parser.parse_delta(); update the logic in the in_reasoning branch of
the parser (the block that uses self._buffer, _tool_call_start, and returns
ReasoningParserResult) to compute the longest suffix of combined that is a
prefix of self._tool_call_start, store that suffix in self._buffer, ensure
in_reasoning stays correct, and then return/continue so the downstream tool
parser can receive a contiguous "<tool_call>" sequence; also add a regression
test that feeds split-tag sequences like ["reasoning<tool", "_call>data"] and
["reasoning<", "tool", "_call>data"] to verify the contiguous tag reaches the
tool parser.

Comment on lines +236 to +238
TOOL_CALL = "<tool_call>"
TOOL_CALL_END = "</tool_call>"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA Apache header/year update.

This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.

As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/llmapi/test_reasoning_parser.py` around lines 236 - 238, Add
or update the NVIDIA Apache-2.0 copyright header at the top of the
test_reasoning_parser.py file (the file containing TOOL_CALL and TOOL_CALL_END)
to include the NVIDIA copyright line with the current year and the full Apache
2.0 license notice per project guidelines; ensure the header format matches
other .py files in the repo and update the year to the latest modification year.

@tijyojwad tijyojwad force-pushed the fix/nvbug-6082303-nano-v3-tool-call-absorbed branch from 304b614 to 265f3d5 Compare May 1, 2026 00:48
…asoning in nano-v3 parser

When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model
occasionally omits </think> before generating <tool_call>. The parser
stayed in reasoning mode and absorbed the tool call markup into
reasoning_content, causing the downstream tool parser to never see it
(~2-8% of streaming tool-call requests silently failed).

Treat <tool_call> as an implicit </think> in NemotronV3ReasoningParser,
following the same pattern as KimiK2ReasoningParser which treats
<|tool_calls_section_begin|> as an implicit reasoning end.

Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
Made-with: Cursor
Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
Made-with: Cursor
Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
Made-with: Cursor
@tijyojwad tijyojwad force-pushed the fix/nvbug-6082303-nano-v3-tool-call-absorbed branch from 265f3d5 to 1600913 Compare May 1, 2026 00:54
@tijyojwad tijyojwad requested a review from 2ez4bz May 1, 2026 00:55
@tijyojwad
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46454 [ run ] triggered by Bot. Commit: 1600913 Link to invocation

force_nonempty_content is set. When the closing tag is found
(in_reasoning transitions from True to False), the accumulation
is cleared to free memory."""
"""Wraps the parent parse_delta to also treat ``<tool_call>`` as an
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: sed -i 's/``//g'`.

assert result.content == content[i]
assert result.reasoning_content == reasoning_context[i]
assert result.content == content[i], \
f"Step {i}: delta={delta_text!r}, expected content={content[i]!r}, got {result.content!r}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the debug strings are unnecessary as pytest will print them anyway upon failure (it overrides assert).

remaining = self._buffer
self._buffer = ""
self.in_reasoning = False
tool_idx = delta_text.find(self._tool_call_start)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe leave a comment that this is for sure non-negative, since we checked the existence of self._tool_call_start at line 346?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants