[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser by tijyojwad · Pull Request #13684 · NVIDIA/TensorRT-LLM

tijyojwad · 2026-05-01T00:39:41Z

When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model occasionally omits before generating <tool_call>. The parser stayed in reasoning mode and absorbed the tool call markup into reasoning_content, causing the downstream tool parser to never see it (~2-8% of streaming tool-call requests silently failed).

Treat <tool_call> as an implicit in NemotronV3ReasoningParser, following the same pattern as KimiK2ReasoningParser which treats <|tool_calls_section_begin|> as an implicit reasoning end.

Made-with: Cursor

Summary by CodeRabbit

New Features
- Reasoning parser now recognizes <tool_call> tags as an implicit end-of-reasoning boundary, improving handling of tool invocations within reasoning blocks in both streaming and non-streaming modes.
Tests
- Extended test coverage to validate tool call tag handling across various scenarios, including streaming and edge cases.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-01T00:45:54Z

📝 Walkthrough

Walkthrough

The NemotronV3ReasoningParser now treats <tool_call> as an implicit end-of-reasoning boundary in both streaming and full-text parsing modes. When detected, reasoning content is truncated at the tag, and text from the tag onward is reclassified as content, with special handling for buffer management and configuration flags.

Changes

Cohort / File(s)	Summary
Reasoning Parser Implementation `tensorrt_llm/llmapi/reasoning_parser.py`	Added logic in `parse_delta` and `parse` methods to recognize and handle `<tool_call>` as an implicit reasoning boundary. Streaming mode searches buffered+incoming text for the tag; full-text mode reclassifies content post-parse. Integrates with `force_nonempty_content` flag and buffer clearing.
Parser Test Suite `tests/unittest/llmapi/test_reasoning_parser.py`	Expanded test coverage with non-streaming cases for `<tool_call>` without preceding `</think>`, parametrized streaming deltas including split tags across increments, and `finish()` behavior validation. Added per-step error messages for streaming assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description explains the problem and solution clearly, but required template sections (Description, Test Coverage) are incomplete and the PR Checklist sections are left as template placeholders.	Fill in the Description and Test Coverage sections with specific details. Ensure the PR Checklist clearly indicates which items have been completed and which are not applicable.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is specific and directly describes the main change: treating <tool_call> as an implicit end-of-reasoning boundary in the nano-v3 parser, with proper ticket reference and type marker.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/llmapi/reasoning_parser.py`:
- Line 299: This file is missing the required NVIDIA copyright + Apache 2.0
header (or needs an updated year); open tensorrt_llm/llmapi/reasoning_parser.py
(you can locate it using the occurrence of self._tool_call_start =
"<tool_call>") and add the standard NVIDIA copyright header with the correct
latest modification year and Apache 2.0 license notice at the very top of the
file, or update the year in the existing header if present.
- Around line 327-341: When in_reasoning and combining self._buffer +
delta_text, detect if there's no full _tool_call_start (tool_idx == -1) but the
end of the combined string contains a partial prefix of self._tool_call_start;
extract and keep that partial suffix in self._buffer (so it isn't emitted as
reasoning_content) and treat the rest as reasoning content (or continue
buffering) before delegating to DeepSeekR1Parser.parse_delta(); update the logic
in the in_reasoning branch of the parser (the block that uses self._buffer,
_tool_call_start, and returns ReasoningParserResult) to compute the longest
suffix of combined that is a prefix of self._tool_call_start, store that suffix
in self._buffer, ensure in_reasoning stays correct, and then return/continue so
the downstream tool parser can receive a contiguous "<tool_call>" sequence; also
add a regression test that feeds split-tag sequences like ["reasoning<tool",
"_call>data"] and ["reasoning<", "tool", "_call>data"] to verify the contiguous
tag reaches the tool parser.

In `@tests/unittest/llmapi/test_reasoning_parser.py`:
- Around line 236-238: Add or update the NVIDIA Apache-2.0 copyright header at
the top of the test_reasoning_parser.py file (the file containing TOOL_CALL and
TOOL_CALL_END) to include the NVIDIA copyright line with the current year and
the full Apache 2.0 license notice per project guidelines; ensure the header
format matches other .py files in the repo and update the year to the latest
modification year.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8187a810-bfaa-486d-9e02-3930c8a50e0f

📥 Commits

Reviewing files that changed from the base of the PR and between 0343a9d and 304b614.

📒 Files selected for processing (2)

tensorrt_llm/llmapi/reasoning_parser.py
tests/unittest/llmapi/test_reasoning_parser.py

coderabbitai · 2026-05-01T00:45:57Z

                "force_nonempty_content", False) is True
        super().__init__(reasoning_at_start=reasoning_at_start,
                         chat_template_kwargs=chat_template_kwargs)
+        self._tool_call_start = "<tool_call>"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA Apache header/year update.

This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.

As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/llmapi/reasoning_parser.py` at line 299, This file is missing the required NVIDIA copyright + Apache 2.0 header (or needs an updated year); open tensorrt_llm/llmapi/reasoning_parser.py (you can locate it using the occurrence of self._tool_call_start = "<tool_call>") and add the standard NVIDIA copyright header with the correct latest modification year and Apache 2.0 license notice at the very top of the file, or update the year in the existing header if present.

coderabbitai · 2026-05-01T00:45:57Z

+        if self.in_reasoning:
+            combined = self._buffer + delta_text
+            tool_idx = combined.find(self._tool_call_start)
+            if tool_idx != -1:
+                end_idx = combined.find(self.reasoning_end)
+                if end_idx == -1 or tool_idx < end_idx:
+                    reasoning = combined[:tool_idx]
+                    content = combined[tool_idx:]
+                    self._buffer = ""
+                    self.in_reasoning = False
+                    if self._force_nonempty_content:
+                        self._found_closing_tag = True
+                        self._accumulated_reasoning = ""
+                    return ReasoningParserResult(content=content,
+                                                 reasoning_content=reasoning)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Buffer partial <tool_call> prefixes before delegating to the parent parser.

If the stream arrives as ["reasoning<tool", "_call>data"] or ["reasoning<", "tool", "_call>data"], tool_idx stays -1 here and control falls through to DeepSeekR1Parser.parse_delta(), which only buffers prefixes of </think>. That still leaks the partial tool tag into reasoning_content, so the downstream tool parser never sees a contiguous <tool_call>.

Suggested fix

if self.in_reasoning: combined = self._buffer + delta_text tool_idx = combined.find(self._tool_call_start) if tool_idx != -1: end_idx = combined.find(self.reasoning_end) if end_idx == -1 or tool_idx < end_idx: reasoning = combined[:tool_idx] content = combined[tool_idx:] self._buffer = "" self.in_reasoning = False if self._force_nonempty_content: self._found_closing_tag = True self._accumulated_reasoning = "" return ReasoningParserResult(content=content, reasoning_content=reasoning) + + last_lt = combined.rfind("<") + if last_lt != -1 and self._tool_call_start.startswith( + combined[last_lt:]): + self._buffer = combined[last_lt:] + return ReasoningParserResult( + reasoning_content=combined[:last_lt])

Please add a regression for that split-tag shape too.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.in_reasoning:

combined = self._buffer + delta_text

tool_idx = combined.find(self._tool_call_start)

if tool_idx != -1:

end_idx = combined.find(self.reasoning_end)

if end_idx == -1 or tool_idx < end_idx:

reasoning = combined[:tool_idx]

content = combined[tool_idx:]

self._buffer = ""

self.in_reasoning = False

if self._force_nonempty_content:

self._found_closing_tag = True

self._accumulated_reasoning = ""

return ReasoningParserResult(content=content,

reasoning_content=reasoning)

if self.in_reasoning:

combined = self._buffer + delta_text

tool_idx = combined.find(self._tool_call_start)

if tool_idx != -1:

end_idx = combined.find(self.reasoning_end)

if end_idx == -1 or tool_idx < end_idx:

reasoning = combined[:tool_idx]

content = combined[tool_idx:]

self._buffer = ""

self.in_reasoning = False

if self._force_nonempty_content:

self._found_closing_tag = True

self._accumulated_reasoning = ""

return ReasoningParserResult(content=content,

reasoning_content=reasoning)

last_lt = combined.rfind("<")

if last_lt != -1 and self._tool_call_start.startswith(

combined[last_lt:]):

self._buffer = combined[last_lt:]

return ReasoningParserResult(

reasoning_content=combined[:last_lt])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/llmapi/reasoning_parser.py` around lines 327 - 341, When in_reasoning and combining self._buffer + delta_text, detect if there's no full _tool_call_start (tool_idx == -1) but the end of the combined string contains a partial prefix of self._tool_call_start; extract and keep that partial suffix in self._buffer (so it isn't emitted as reasoning_content) and treat the rest as reasoning content (or continue buffering) before delegating to DeepSeekR1Parser.parse_delta(); update the logic in the in_reasoning branch of the parser (the block that uses self._buffer, _tool_call_start, and returns ReasoningParserResult) to compute the longest suffix of combined that is a prefix of self._tool_call_start, store that suffix in self._buffer, ensure in_reasoning stays correct, and then return/continue so the downstream tool parser can receive a contiguous "<tool_call>" sequence; also add a regression test that feeds split-tag sequences like ["reasoning<tool", "_call>data"] and ["reasoning<", "tool", "_call>data"] to verify the contiguous tag reaches the tool parser.

coderabbitai · 2026-05-01T00:45:57Z

+TOOL_CALL = "<tool_call>"
+TOOL_CALL_END = "</tool_call>"
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA Apache header/year update.

This modified Python file still has no NVIDIA copyright + Apache 2.0 header at the top. Please add it, or update the existing header year, before merge.

As per coding guidelines, "**/*.{cpp,h,cu,cuh,py}: All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice`."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unittest/llmapi/test_reasoning_parser.py` around lines 236 - 238, Add or update the NVIDIA Apache-2.0 copyright header at the top of the test_reasoning_parser.py file (the file containing TOOL_CALL and TOOL_CALL_END) to include the NVIDIA copyright line with the current year and the full Apache 2.0 license notice per project guidelines; ensure the header format matches other .py files in the repo and update the year to the latest modification year.

…asoning in nano-v3 parser When serving Nemotron-3-Super with --reasoning_parser nano-v3, the model occasionally omits </think> before generating <tool_call>. The parser stayed in reasoning mode and absorbed the tool call markup into reasoning_content, causing the downstream tool parser to never see it (~2-8% of streaming tool-call requests silently failed). Treat <tool_call> as an implicit </think> in NemotronV3ReasoningParser, following the same pattern as KimiK2ReasoningParser which treats <|tool_calls_section_begin|> as an implicit reasoning end. Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com> Made-with: Cursor

tijyojwad · 2026-05-01T00:55:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-01T01:02:01Z

PR_Github #46454 [ run ] triggered by Bot. Commit: 1600913 Link to invocation

2ez4bz · 2026-05-01T04:39:33Z

-        force_nonempty_content is set. When the closing tag is found
-        (in_reasoning transitions from True to False), the accumulation
-        is cleared to free memory."""
+        """Wraps the parent parse_delta to also treat ``<tool_call>`` as an


Nit: sed -i 's/``//g'`.

2ez4bz · 2026-05-01T04:48:40Z

-        assert result.content == content[i]
-        assert result.reasoning_content == reasoning_context[i]
+        assert result.content == content[i], \
+            f"Step {i}: delta={delta_text!r}, expected content={content[i]!r}, got {result.content!r}"


Nit: the debug strings are unnecessary as pytest will print them anyway upon failure (it overrides assert).

2ez4bz · 2026-05-01T04:50:25Z

+            remaining = self._buffer
+            self._buffer = ""
+            self.in_reasoning = False
+            tool_idx = delta_text.find(self._tool_call_start)


Nit: maybe leave a comment that this is for sure non-negative, since we checked the existence of self._tool_call_start at line 346?

tijyojwad requested a review from a team as a code owner May 1, 2026 00:39

tijyojwad requested a review from mikeiovine May 1, 2026 00:39

github-actions Bot assigned tijyojwad May 1, 2026

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

tijyojwad force-pushed the fix/nvbug-6082303-nano-v3-tool-call-absorbed branch from 304b614 to 265f3d5 Compare May 1, 2026 00:48

tijyojwad force-pushed the fix/nvbug-6082303-nano-v3-tool-call-absorbed branch from 265f3d5 to 1600913 Compare May 1, 2026 00:54

tijyojwad requested a review from 2ez4bz May 1, 2026 00:55

2ez4bz approved these changes May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser#13684

[https://nvbugs/6082303][fix] Treat <tool_call> as implicit end-of-reasoning in nano-v3 parser#13684
tijyojwad wants to merge 1 commit intoNVIDIA:mainfrom
tijyojwad:fix/nvbug-6082303-nano-v3-tool-call-absorbed

tijyojwad commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

tijyojwad commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

2ez4bz May 1, 2026

Uh oh!

2ez4bz May 1, 2026

Uh oh!

2ez4bz May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tijyojwad commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 1, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

tijyojwad commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

2ez4bz May 1, 2026

Choose a reason for hiding this comment

Uh oh!

2ez4bz May 1, 2026

Choose a reason for hiding this comment

Uh oh!

2ez4bz May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tijyojwad commented May 1, 2026 •

edited by coderabbitai Bot

Loading