Skip to content

Refactor hf tool calling#455

Merged
Nash0x7E2 merged 10 commits intomainfrom
refactor-hf-tool-calling
Apr 2, 2026
Merged

Refactor hf tool calling#455
Nash0x7E2 merged 10 commits intomainfrom
refactor-hf-tool-calling

Conversation

@maxkahan
Copy link
Copy Markdown
Contributor

@maxkahan maxkahan commented Mar 26, 2026

  • Refactor all the tool calling in the HuggingFace plugin
  • Adds the tool calling to the VLM transformers vlm
  • Abstracts out the calling to a separate module

Summary by CodeRabbit

  • New Features

    • Added comprehensive HuggingFace tool-calling support: multi-round tool execution, streaming tool-call handling, and unified response creation for LLM and VLM workflows.
    • Exposed configurable max rounds for tool execution and cleaner event emission for streaming/non-streaming flows.
  • Examples

    • Added a standalone example demonstrating a transformers-backed agent that registers and uses async tools during a call.
  • Tests

    • Added extensive async unit tests covering tool conversion, streaming accumulation, parsing, and multi-round execution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

This PR extracts shared HuggingFace tool-calling infrastructure into new modules, refactors LLM/VLM code to use that infrastructure, adds tests covering streaming and multi‑round tool execution, and includes an example demonstrating a tool-calling agent with a local Transformers model. (47 words)

Changes

Cohort / File(s) Summary
Shared tool-calling infra
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py, plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
New modules: tool schema conversion (convert_tools_to_chat_completions_format), streaming/non-streaming HF response handling (create_hf_response), accumulation/finalization of streaming tool-call fragments, and the multi‑round run_tool_call_loop orchestration.
Provider LLM/VLM refactor
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py, plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py, plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py, plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
Replaced in-file tool-calling/streaming logic with calls into shared helpers; added create_response entrypoints, tool-format delegation, max_tool_rounds handling, and moved event-emission adjustments to use shared flow.
Tool-calling tests
plugins/huggingface/tests/test_hf_tool_calling.py, plugins/huggingface/tests/test_huggingface.py, plugins/huggingface/tests/test_transformers_llm.py, plugins/huggingface/tests/test_transformers_vlm.py
Added comprehensive unit tests for tool schema conversion, streaming chunk accumulation/finalization, extraction/parsing of tool calls, and multi‑round run_loop behavior; updated existing tests to exercise new tool-call flows and event emission.
Example
plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
New CLI example script demonstrating a Tool-Calling Agent backed by a local TransformersLLM, registering async tools, joining a call, and exercising STT/TTS and tool-driven follow-ups.
Test/stream stubs & helpers
plugins/huggingface/tests/... (helpers within changed tests)
Refactored test streaming helpers to SimpleNamespace-based chunks and updated error simulation to OSError in a couple of tests.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant LLM as TransformersLLM / VLM
    participant HF as HuggingFace Model (API)
    participant ToolLoop as Tool-Call Loop
    participant Tools as Registered Tools

    App->>LLM: create_response(messages, tools)
    LLM->>HF: send request (includes tools schema)
    HF-->>LLM: stream / non-stream response (deltas or full message)

    alt finish_reason == "tool_calls" or tool markers detected
        LLM->>ToolLoop: extract tool calls & run_tool_call_loop()
        ToolLoop->>Tools: invoke registered async tool functions
        Tools-->>ToolLoop: return tool outputs
        ToolLoop->>LLM: generate_followup(messages + tool outputs)
        LLM->>HF: send follow-up request
        HF-->>LLM: follow-up response (may contain more tool calls)
        loop Until no more tool calls or max_rounds
            ToolLoop->>Tools: execute additional tool calls...
        end
        ToolLoop-->>App: final LLMResponseEvent (natural-language answer)
    else no tool calls
        LLM-->>App: LLMResponseEvent (natural-language answer)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • aliev

Poem

The model dreams like a blackened room at noon,
Tool-keys clattering in the ribs of speech,
Each function a small shard that answers, then is gone,
I spit the cleaned result into the neat mouth of reply,
And watch the rounds bloom cold and brilliant as a bell.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Refactor hf tool calling' accurately describes the main objective: refactoring tool-calling logic across the HuggingFace plugin, which is reflected throughout the changeset with new shared modules and simplified implementations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor-hf-tool-calling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Nash0x7E2 Nash0x7E2 self-requested a review March 26, 2026 18:03
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
plugins/huggingface/tests/test_hf_tool_calling.py (2)

139-164: Prefer a tiny fake LLM over MagicMock/AsyncMock in these loop tests.

The contract under test is small, and a fake object with real methods would catch signature drift that these loose mocks will not.

As per coding guidelines, "Never mock in tests; use pytest for testing."

Also applies to: 210-223

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 139 - 164,
The test uses MagicMock/AsyncMock for llm (including llm._dedup_and_execute and
llm._sanitize_tool_output) which can hide signature drift; replace this with a
tiny fake LLM class that implements async def _dedup_and_execute(self, ...)
returning the two planned side-effect tuples and def _sanitize_tool_output(self,
v) returning str(v) so the test exercises real method signatures; update both
occurrences (lines around the current llm setup and the similar block at
210-223) to instantiate the fake LLM instead of MagicMock/AsyncMock.

6-14: Avoid importing private helper modules directly from tests.

These imports make internal module names part of the test contract, so a helper move or rename becomes a test-breaking API change. Please re-export the supported helpers or colocate these tests with the package-private modules instead of importing _foo modules from the external test package.

As per coding guidelines, "Never import from private modules (_foo) outside of the package's own __init__.py. Use the public re-export instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, Tests
are importing private helpers (_hf_tool_calling, _tool_call_loop) directly;
instead either re-export the required symbols (accumulate_tool_call_chunk,
convert_tools_to_hf_format, extract_tool_calls_from_hf_response,
finalize_pending_tool_calls, run_tool_call_loop) from the package's public API
(e.g., add them to vision_agents.plugins.huggingface.__init__ or another public
module) and update the test imports to use that public path, or move the tests
into the package-private module directory so they can import the private modules
locally; ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The caller-provided kwargs are not being forwarded into
the HuggingFace request parameter dicts; update the request building in both
places where request_kwargs is constructed (the initial request in the function
and inside _generate_followup) to merge the incoming **kwargs into
request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs
when building the dict) so options like temperature/max_tokens/provider flags
are preserved; ensure you still explicitly set/override required keys like
"model" from kwargs when needed and keep the existing keys ("messages",
"stream", "tools") intact.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 95-121: The loop currently skips triples with a falsey
NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is
missing generate a stable fallback id (e.g., using the loop index or uuid4) and
use that fallback for both assistant_tool_calls and tool_results so every
executed tool produces messages; remove the early continue and ensure the code
still uses tc["name"] and tc.get("arguments_json", {}) for the
assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else
res) for tool_results, preserving the existing structures and returning
llm_response only after processing all triples.

In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`:
- Around line 136-145: The refactor dropped caller-supplied generation kwargs
when routing through create_hf_response; update the call site in create_response
(where effective_model is computed) to pass through the remaining kwargs (e.g.,
temperature, max_tokens, stop) by merging kwargs into the payload passed to
create_hf_response instead of only sending messages/model/stream/tools; also
apply the same merge pattern in the tool-call follow-up builder in
_hf_tool_calling.py so the tool-call requests (in the function that builds
follow-up HF request payloads) include the original caller kwargs as well.

In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`:
- Around line 143-165: The current success/failed branching trusts
response.exception but follow-up tool-call failures clear the exception and
return empty text, causing false VLMInferenceCompletedEvent emissions; fix by
ensuring follow-up errors propagate the exception (update the follow-up path in
_hf_tool_calling to set LLMResponseEvent.exception with the underlying error)
and make the caller here (create_response / this response handling block) treat
empty responses as failures too (change the conditional to check if
response.exception is not None OR response.text is falsy, and emit VLMErrorEvent
in that case; otherwise emit VLMInferenceCompletedEvent).

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 340-344: Detected re-entry into create_response() from inside
run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and
loses caller sampling/kwargs; instead, change the flow so that when
extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call
create_response() again. Modify _handle_tool_calls (and any helpers it uses) to
return a structured continuation (e.g., updated messages and any tool results or
a flag) back to the current run_tool_call_loop so the existing loop resumes
handling tool calls and preserves the original kwargs and max_new_tokens; remove
any calls to create_response() from within _handle_tool_calls/run_tool_call_loop
and ensure _handle_tool_calls updates messages/state and returns control to the
caller for the next iteration.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 221-229: The current simple_response flow appends the user's
plain-text turn twice by adding it into image_content and then replaying the
conversation via _build_messages(); to fix, ensure the user text is only added
once: either stop appending the plain-text entry to image_content when the last
built message is already the same user turn, or add the multimodal image_content
as the single user message and avoid replaying/duplicating that same user text
in _build_messages(); locate the multimodal assembly in simple_response
(variables frames_snapshot, image_content, messages and method _build_messages)
and implement the conditional dedupe so the user text is not duplicated when
frames_snapshot is empty.
- Around line 374-379: The auto-dispatch currently returns await
self._handle_tool_calls(...) when tool calls are detected, which causes
recursive loops and loses the caller's sampling settings (e.g., max_new_tokens);
change this to use the shared non-recursive follow-up generator instead of
awaiting _handle_tool_calls directly: in the branch that detects tool_calls
(where extract_tool_calls_from_text is used) return or yield a follow-up from
the central run_tool_call_loop (or a new helper like
_enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively
and forwards the original kwargs (preserving max_new_tokens and sampling
parameters) so run_tool_call_loop/create_response are not re-entered
recursively; also remove any direct create_response() calls from inside
run_tool_call_loop and refactor _handle_tool_calls to be invoked by the
follow-up orchestrator rather than awaited inline.
- Around line 302-305: The retry path in _build_processor_inputs that calls
apply_chat_template() without the tool schema must also clear the tools flag so
downstream logic doesn't treat the output as if tools were provided; update
_build_processor_inputs (where it retries without tools) to set tools_param =
None (or a clear boolean like tools_present=False) when it falls back, and
ensure the caller code that checks "if tools_param and output_text:" (or the
variable output_text) uses that updated flag so JSON-like model outputs aren't
misclassified as tool calls when no schema was shown.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 139-164: The test uses MagicMock/AsyncMock for llm (including
llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature
drift; replace this with a tiny fake LLM class that implements async def
_dedup_and_execute(self, ...) returning the two planned side-effect tuples and
def _sanitize_tool_output(self, v) returning str(v) so the test exercises real
method signatures; update both occurrences (lines around the current llm setup
and the similar block at 210-223) to instantiate the fake LLM instead of
MagicMock/AsyncMock.
- Around line 6-14: Tests are importing private helpers (_hf_tool_calling,
_tool_call_loop) directly; instead either re-export the required symbols
(accumulate_tool_call_chunk, convert_tools_to_hf_format,
extract_tool_calls_from_hf_response, finalize_pending_tool_calls,
run_tool_call_loop) from the package's public API (e.g., add them to
vision_agents.plugins.huggingface.__init__ or another public module) and update
the test imports to use that public path, or move the tests into the
package-private module directory so they can import the private modules locally;
ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f40713a7-b283-4df3-8695-abd2a2d4a477

📥 Commits

Reviewing files that changed from the base of the PR and between 072ad74 and a740a8d.

⛔ Files ignored due to path filters (1)
  • plugins/huggingface/examples/transformers/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • plugins/huggingface/examples/transformers/pyproject.toml
  • plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
  • plugins/huggingface/tests/test_hf_tool_calling.py
  • plugins/huggingface/tests/test_huggingface.py
  • plugins/huggingface/tests/test_transformers_llm.py
  • plugins/huggingface/tests/test_transformers_vlm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
  • plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
  • plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
💤 Files with no reviewable changes (1)
  • plugins/huggingface/examples/transformers/pyproject.toml

@maxkahan maxkahan force-pushed the refactor-hf-tool-calling branch from a740a8d to ccba289 Compare April 2, 2026 15:13
@maxkahan maxkahan dismissed Nash0x7E2’s stale review April 2, 2026 15:17

changes made

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py (1)

102-105: ⚠️ Potential issue | 🟠 Major

Generate a fallback tool-call id instead of dropping executed calls.

NormalizedToolCallItem.id is optional, but this branch discards any executed call whose provider omitted it. That means the tool runs, yet no assistant/tool messages are added for that call, and a round with only id-less calls returns the previous empty response instead of asking for the follow-up.

🧩 Suggested fix
-        for tc, res, err in triples:
-            cid = tc.get("id")
-            if not cid:
-                continue
+        for call_index, (tc, res, err) in enumerate(triples):
+            cid = tc.get("id") or f"tool_call_{round_num}_{call_index}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`
around lines 102 - 105, The loop over triples drops executed calls when
tc.get("id") is missing; instead generate a fallback id (e.g., UUID or a
deterministic incremental/fallback prefix) and assign it to the tool-call item
so the call is not discarded. In the for-loop handling triples (variables tc,
res, err) inside _tool_call_loop.py, replace the continue branch with code that
creates a fallback id and sets tc["id"] (or cid) before proceeding so subsequent
message creation and bookkeeping treat the call as executed; ensure the chosen
id is unique within the current session/round and used wherever id is referenced
later in this function.
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)

583-596: ⚠️ Potential issue | 🟠 Major

Carry the caller's explicit generation settings into tool follow-ups.

_generate_followup() only forwards **kwargs, but stream, max_new_tokens, temperature, and do_sample are explicit create_response() parameters, not entries in that dict. Round 2+ therefore falls back to defaults and can change sampling/output length halfway through a single response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`
around lines 583 - 596, _generate_followup currently only forwards **kwargs to
create_response which omits explicit generation args (stream, max_new_tokens,
temperature, do_sample) causing follow-ups to use defaults; update
_generate_followup (inside _handle_tool_calls) to capture and pass the caller's
explicit generation settings into the create_response call by forwarding
stream=stream, max_new_tokens=max_new_tokens, temperature=temperature,
do_sample=do_sample (sourced from the enclosing _handle_tool_calls parameters or
kwargs) so every create_response invocation (including round 2+) uses the same
sampling and length settings.
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)

116-123: ⚠️ Potential issue | 🟠 Major

Preserve caller options on both HuggingFace requests.

Both payload builders are recreated from scratch, so temperature, max_tokens, stop, provider-specific kwargs, etc. are silently dropped. The follow-up path also hardcodes stream=True, so a create_response(..., stream=False) call can switch modes after the first tool round.

Also applies to: 328-334

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 116 - 123, The payload builders recreate request_kwargs and drop
caller options (temperature, max_tokens, stop, provider-specific kwargs) and the
follow-up path hardcodes stream=True; fix by starting each payload with a
shallow copy/merge of the original caller options dict (preserve entries like
temperature, max_tokens, stop and any provider-specific kwargs) before
adding/overriding messages, model, and tools (use the existing request_kwargs
variable and only set request_kwargs["tools"] = tools if tools is present), and
ensure the follow-up code (the create_response path that currently forces
stream=True) uses the incoming stream value instead of hardcoding True so
streaming mode is preserved across rounds.
🧹 Nitpick comments (1)
plugins/huggingface/tests/test_hf_tool_calling.py (1)

6-14: Avoid binding the test suite to private HuggingFace modules.

These tests import _hf_tool_calling and _tool_call_loop directly from outside the package implementation, so an internal module move/rename becomes a test break instead of an encapsulated refactor. Re-export the helpers from the public package surface or cover them through public callers.

As per coding guidelines, "Never import from private modules (_foo) outside of the package's own __init__.py. Use the public re-export instead".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, The
test imports private modules (_hf_tool_calling, _tool_call_loop) directly;
change the tests to use the public package surface by re-exporting the helpers
(accumulate_tool_call_chunk, convert_tools_to_hf_format,
extract_tool_calls_from_hf_response, finalize_pending_tool_calls,
run_tool_call_loop) from the package's public __init__.py (or another public
module) and update the test imports to import these symbols from that public
module instead of from _hf_tool_calling/_tool_call_loop so future internal
refactors won't break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 234-253: The code emits an LLMResponseCompletedEvent even when
finish_reason == "tool_calls", causing a premature "completed" event for
intermediate tool-call turns; change the logic around the
llm.events.send(LLMResponseCompletedEvent(...)) call so it is skipped when
finish_reason == "tool_calls" (i.e., only send the completed event when
finish_reason != "tool_calls"), and apply the same change in the other streaming
branch handled around the _handle_tool_calls() path (the similar block reported
at the other location). Ensure you reference the finish_reason variable and do
not call llm.events.send for the tool_calls case so the actual follow-up answer
will be the sole completed event.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 109-112: The logger currently emits raw tool arguments/results
(variables name, args, res, err) at info/warning level; replace those calls with
redacted summaries (e.g., show tool name and a truncated/placeholder summary of
args/res) and move the full verbatim payloads to debug-only logging. Concretely,
update the logger.warning and logger.info in _tool_call_loop.py to avoid
printing args/res directly (use a short summary or truncated length and mask
possible secrets) and add logger.debug lines that log the full args/res/err for
troubleshooting.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 357-382: The code emits a synthetic LLMResponseChunkEvent
unconditionally after tool handling; change it to respect the original stream
flag by checking the incoming stream parameter (or stored attribute) before
sending the LLMResponseChunkEvent: only send the LLMResponseChunkEvent when
stream is True, otherwise skip emitting chunk events and let the non-streaming
completion path (completion-only events) run; update the block around
suppress_events/result.text/is_tool_followup/_handle_tool_calls to gate the
event send on stream so non-stream callers do not receive chunk events.
- Around line 133-147: The current regex-based fallback (json_pattern) in the
loop that extracts tool calls only handles flat objects and fails on nested
arguments; replace the regex approach in transformers_llm.py with a balanced
JSON extractor (e.g., scan the text for '{', track brace depth to find the
matching closing brace or use json.JSONDecoder().raw_decode to locate full JSON
objects) and then json.loads each balanced substring; once parsed, keep the
existing checks that the object has "name" and "arguments" and append to
tool_calls with the same keys ("type","id","name","arguments_json"); this
ensures nested "arguments" like {"filters":{"owner":"me"}} are accepted.

---

Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The payload builders recreate request_kwargs and drop
caller options (temperature, max_tokens, stop, provider-specific kwargs) and the
follow-up path hardcodes stream=True; fix by starting each payload with a
shallow copy/merge of the original caller options dict (preserve entries like
temperature, max_tokens, stop and any provider-specific kwargs) before
adding/overriding messages, model, and tools (use the existing request_kwargs
variable and only set request_kwargs["tools"] = tools if tools is present), and
ensure the follow-up code (the create_response path that currently forces
stream=True) uses the incoming stream value instead of hardcoding True so
streaming mode is preserved across rounds.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 102-105: The loop over triples drops executed calls when
tc.get("id") is missing; instead generate a fallback id (e.g., UUID or a
deterministic incremental/fallback prefix) and assign it to the tool-call item
so the call is not discarded. In the for-loop handling triples (variables tc,
res, err) inside _tool_call_loop.py, replace the continue branch with code that
creates a fallback id and sets tc["id"] (or cid) before proceeding so subsequent
message creation and bookkeeping treat the call as executed; ensure the chosen
id is unique within the current session/round and used wherever id is referenced
later in this function.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 583-596: _generate_followup currently only forwards **kwargs to
create_response which omits explicit generation args (stream, max_new_tokens,
temperature, do_sample) causing follow-ups to use defaults; update
_generate_followup (inside _handle_tool_calls) to capture and pass the caller's
explicit generation settings into the create_response call by forwarding
stream=stream, max_new_tokens=max_new_tokens, temperature=temperature,
do_sample=do_sample (sourced from the enclosing _handle_tool_calls parameters or
kwargs) so every create_response invocation (including round 2+) uses the same
sampling and length settings.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 6-14: The test imports private modules (_hf_tool_calling,
_tool_call_loop) directly; change the tests to use the public package surface by
re-exporting the helpers (accumulate_tool_call_chunk,
convert_tools_to_hf_format, extract_tool_calls_from_hf_response,
finalize_pending_tool_calls, run_tool_call_loop) from the package's public
__init__.py (or another public module) and update the test imports to import
these symbols from that public module instead of from
_hf_tool_calling/_tool_call_loop so future internal refactors won't break tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 174992df-5217-4cd7-ae4f-de9210da383f

📥 Commits

Reviewing files that changed from the base of the PR and between a740a8d and ccba289.

📒 Files selected for processing (12)
  • plugins/huggingface/examples/transformers/transformers_mcp_tool_calling_example.py
  • plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
  • plugins/huggingface/tests/test_hf_tool_calling.py
  • plugins/huggingface/tests/test_huggingface.py
  • plugins/huggingface/tests/test_transformers_llm.py
  • plugins/huggingface/tests/test_transformers_vlm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
  • plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
  • plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/huggingface/tests/test_transformers_vlm.py
  • plugins/huggingface/examples/transformers/transformers_tool_calling_example.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)

380-380: ⚠️ Potential issue | 🟠 Major

Propagate the original generation options into tool follow-ups.

_generate_followup() only forwards kwargs, so follow-up rounds fall back to default stream=True, max_new_tokens=None, temperature=0.7, and do_sample=True. That reintroduces chunk events for non-stream callers and changes sampling/token budgets after the first tool round.

🧩 Proposed fix
-                return await self._handle_tool_calls(tool_calls, messages, kwargs)
+                return await self._handle_tool_calls(
+                    tool_calls,
+                    messages,
+                    stream=stream,
+                    max_new_tokens=max_new_tokens,
+                    temperature=temperature,
+                    do_sample=do_sample,
+                    kwargs=kwargs,
+                )
@@
     async def _handle_tool_calls(
         self,
         tool_calls: list[NormalizedToolCallItem],
         messages: list[dict[str, Any]],
-        kwargs: dict[str, Any],
+        *,
+        stream: bool,
+        max_new_tokens: int | None,
+        temperature: float,
+        do_sample: bool,
+        kwargs: dict[str, Any],
     ) -> LLMResponseEvent:
@@
             result = await self.create_response(
-                messages=current_messages, _tool_followup=True, **kwargs
+                messages=current_messages,
+                stream=stream,
+                max_new_tokens=max_new_tokens,
+                temperature=temperature,
+                do_sample=do_sample,
+                _tool_followup=True,
+                **kwargs,
             )

Also applies to: 599-614

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` at
line 380, The follow-up generation path in _generate_followup is dropping the
caller's original generation options (stream, max_new_tokens, temperature,
do_sample, etc.), causing subsequent tool rounds to revert to defaults; update
the flow so the original generation options are captured from the initial call
(kwargs passed into the first generation) and merged/preserved when invoking
_generate_followup and _handle_tool_calls (e.g., accept an explicit gen_options
parameter or forward the original kwargs through the call chain), ensuring
_generate_followup uses those merged options instead of defaults so streaming,
sampling, and token limits remain consistent across tool follow-ups.
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py (1)

107-110: ⚠️ Potential issue | 🟠 Major

Redact tool payloads in normal logs.

args, res, and err can contain user data or large blobs. Logging them verbatim at info/warning leaks sensitive content into routine logs and can explode log volume.

🪵 Suggested logging change
             if err is not None:
-                logger.warning("  [tool] %s(%s) failed: %s", name, args, err)
+                logger.warning("  [tool] %s failed", name)
+                logger.debug("  [tool] %s args=%r err=%r", name, args, err)
             else:
-                logger.info("  [tool] %s(%s) → %s", name, args, res)
+                logger.info("  [tool] %s succeeded", name)
+                logger.debug("  [tool] %s args=%r result=%r", name, args, res)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`
around lines 107 - 110, The current logging in the tool call loop logs raw
payloads (args, res, err) which may contain sensitive or large user data; update
the logger usage in the function/method containing the logger calls so that only
non-sensitive metadata is logged (e.g., tool name, argument/res/err types,
lengths/sizes, and a redacted placeholder like "<redacted>") instead of the full
args/res/err values; for errors include err type/message truncated or a hashed
identifier rather than full content and preserve existing logger levels
(logger.warning for errors, logger.info for successes) while removing direct
dumps of args/res/err to avoid leaking data or exploding log volume.
🧹 Nitpick comments (1)
plugins/huggingface/tests/test_transformers_llm.py (1)

301-304: Avoid hasattr() in this assertion.

You already split chunk and completed events above, so this can use direct attribute access and stay simpler.

♻️ Proposed change
-        for evt in chunk_events + completed_events:
-            assert "<tool_call>" not in (
-                evt.delta if hasattr(evt, "delta") else evt.text
-            )
+        for evt in chunk_events:
+            assert "<tool_call>" not in evt.delta
+        for evt in completed_events:
+            assert "<tool_call>" not in evt.text

As per coding guidelines, "Avoid using getattr, hasattr, delattr, setattr; prefer normal attribute access in Python".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_llm.py` around lines 301 - 304,
The assertion currently uses hasattr on evt; instead iterate the already-split
lists directly: for each evt in chunk_events assert "<tool_call>" not in
evt.delta, and for each evt in completed_events assert "<tool_call>" not in
evt.text. Update the loop that currently iterates over chunk_events +
completed_events (variables chunk_events, completed_events, and evt) to two
explicit loops using direct attribute access instead of hasattr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 77-80: The code caches/returns a raw tool-call payload (e.g.,
follow_up containing "<tool_call>...</tool_call>") as the fallback LLM response;
update the logic around llm_response, follow_up and next_tool_calls (the
_tool_followup branch) so that whenever next_tool_calls is non-empty you do NOT
assign or persist the raw tool-call text to llm_response.text or return values:
instead strip or replace the tool-call markup with a safe placeholder or empty
final-answer text and only store the sanitized/terminal response; ensure the
same change is applied to the other similar blocks that touch follow_up and
llm_response (references: llm_response: LLMResponseEvent, follow_up,
next_tool_calls, _tool_followup) so tool markup never leaks back to the caller.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 140-159: The parser currently injects synthetic ids (uuid4) when
an extracted tool-call lacks an "id", which breaks deduping in
run_tool_call_loop; change both construction sites in transformers_llm.py so
they do NOT generate a fallback uuid: use obj.get("id") (or only include the
"id" key when obj contains it) instead of obj.get("id", str(uuid.uuid4())) and
remove the str(uuid.uuid4()) used in the later loop; keep all other fields
("type","name","arguments_json") unchanged so missing ids remain missing and
run_tool_call_loop can synthesize stable fallback ids.

---

Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 107-110: The current logging in the tool call loop logs raw
payloads (args, res, err) which may contain sensitive or large user data; update
the logger usage in the function/method containing the logger calls so that only
non-sensitive metadata is logged (e.g., tool name, argument/res/err types,
lengths/sizes, and a redacted placeholder like "<redacted>") instead of the full
args/res/err values; for errors include err type/message truncated or a hashed
identifier rather than full content and preserve existing logger levels
(logger.warning for errors, logger.info for successes) while removing direct
dumps of args/res/err to avoid leaking data or exploding log volume.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Line 380: The follow-up generation path in _generate_followup is dropping the
caller's original generation options (stream, max_new_tokens, temperature,
do_sample, etc.), causing subsequent tool rounds to revert to defaults; update
the flow so the original generation options are captured from the initial call
(kwargs passed into the first generation) and merged/preserved when invoking
_generate_followup and _handle_tool_calls (e.g., accept an explicit gen_options
parameter or forward the original kwargs through the call chain), ensuring
_generate_followup uses those merged options instead of defaults so streaming,
sampling, and token limits remain consistent across tool follow-ups.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_transformers_llm.py`:
- Around line 301-304: The assertion currently uses hasattr on evt; instead
iterate the already-split lists directly: for each evt in chunk_events assert
"<tool_call>" not in evt.delta, and for each evt in completed_events assert
"<tool_call>" not in evt.text. Update the loop that currently iterates over
chunk_events + completed_events (variables chunk_events, completed_events, and
evt) to two explicit loops using direct attribute access instead of hasattr.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 653428c8-5007-45fd-bb3a-d283f1afa361

📥 Commits

Reviewing files that changed from the base of the PR and between ccba289 and 95c4419.

📒 Files selected for processing (3)
  • plugins/huggingface/tests/test_transformers_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)

116-123: ⚠️ Potential issue | 🟠 Major

Forward caller kwargs into both HF request builders.

create_hf_response() accepts **kwargs, but the dicts at Lines 116-123 and 330-336 never merge them. Line 332 already consults kwargs.get("model"), so generation options and provider flags are silently dropped, and model can diverge between the initial and follow-up turns.

Suggested fix
     request_kwargs: dict[str, Any] = {
+        **kwargs,
         "messages": messages,
-        "model": model_id,
+        "model": kwargs.get("model", model_id),
         "stream": stream,
     }
     if tools:
         request_kwargs["tools"] = tools
         request_kwargs: dict[str, Any] = {
+            **kwargs,
             "messages": current_messages,
             "model": kwargs.get("model", model_id),
             "stream": True,
         }

Also applies to: 330-336

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 116 - 123, The request kwargs built for HF calls in
create_hf_response are not merging the passed-in **kwargs so generation options
and provider flags (and an overriding model) are dropped; update the
request_kwargs construction in create_hf_response and the corresponding
follow-up builder to merge in kwargs (e.g., request_kwargs.update(kwargs) or
selectively copy relevant keys), ensure kwargs.get("model") overrides the local
model_id consistently, and preserve existing keys like "messages", "stream", and
"tools" while allowing generation/provider flags from **kwargs to be forwarded
to the HF request builders.
🧹 Nitpick comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)

33-35: Tighten the shared helper types.

This module sits on the shared HF/plugin boundary, but these signatures still use Any for chunks, responses, and request kwargs. Concrete HF types—or small Protocol/TypedDict wrappers if the hub types are too loose—would let static checking catch missing choices, delta, and tool_calls members here.

As per coding guidelines, "Avoid using Any type in type annotations" and "Use type annotations everywhere with modern syntax: X | Y unions, dict[str, T] generics, full Callable signatures, Optional for nullable params".

Also applies to: 72-74, 99-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 33 - 35, The helper functions (e.g., accumulate_tool_call_chunk)
use broad Any types for chunks, responses, and request kwargs; tighten these by
defining small Protocols or TypedDicts that declare the exact members you access
(e.g., choices, delta, tool_calls, etc.) and use them in signatures instead of
Any (use modern syntax like dict[int, ToolChunkProtocol], Optional[...], or X |
Y for unions); update all affected functions referenced in this module to accept
those concrete types so static checkers can verify access to
choices/delta/tool_calls and avoid Any usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 289-300: The code calls _handle_tool_calls(...) after
extract_tool_calls_from_hf_response(...) but forces follow-ups to stream by
hard-coding stream=True inside _generate_followup; instead thread the original
stream boolean from the initial request into _handle_tool_calls (pass it as an
extra parameter) and propagate it into _generate_followup; then implement two
follow-up branches inside _generate_followup (or the caller): one that yields an
async iterator when stream=True (existing behavior) and one that performs a
single non-stream call and returns the full response object when stream=False
(so no chunks are emitted). Update all call sites (e.g., where
_handle_tool_calls and _generate_followup are invoked) to accept and forward the
stream flag and handle the returned type accordingly.
- Around line 340-349: The except block handling (HfHubHTTPError,
InferenceTimeoutError, OSError) currently logs and emits an LLMErrorEvent but
returns LLMResponseEvent(original=None, text=""), [] which drops the exception;
change the return so the LLMResponseEvent preserves the caught exception (e.g.,
set its exception/exception_info field to e or include it in event_data) while
keeping original=None and text="" so callers can distinguish a transport failure
from a valid empty completion; update the return in that except block (the one
that calls logger.exception and llm.events.send with plugin_name and
error_message=str(e)) to include the exception on the LLMResponseEvent.

---

Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The request kwargs built for HF calls in
create_hf_response are not merging the passed-in **kwargs so generation options
and provider flags (and an overriding model) are dropped; update the
request_kwargs construction in create_hf_response and the corresponding
follow-up builder to merge in kwargs (e.g., request_kwargs.update(kwargs) or
selectively copy relevant keys), ensure kwargs.get("model") overrides the local
model_id consistently, and preserve existing keys like "messages", "stream", and
"tools" while allowing generation/provider flags from **kwargs to be forwarded
to the HF request builders.

---

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 33-35: The helper functions (e.g., accumulate_tool_call_chunk) use
broad Any types for chunks, responses, and request kwargs; tighten these by
defining small Protocols or TypedDicts that declare the exact members you access
(e.g., choices, delta, tool_calls, etc.) and use them in signatures instead of
Any (use modern syntax like dict[int, ToolChunkProtocol], Optional[...], or X |
Y for unions); update all affected functions referenced in this module to accept
those concrete types so static checkers can verify access to
choices/delta/tool_calls and avoid Any usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d12c9616-00df-4cdf-978b-4555f375317a

📥 Commits

Reviewing files that changed from the base of the PR and between 95c4419 and 768365d.

📒 Files selected for processing (1)
  • plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py

Comment on lines +289 to +300
tool_calls = extract_tool_calls_from_hf_response(response)
if tool_calls:
return await _handle_tool_calls(
llm,
client,
model_id,
plugin_name,
tool_calls,
messages,
tools,
kwargs,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor the original stream mode after tool execution.

Lines 289-300 can reach _handle_tool_calls() from the non-stream path, but Line 333 hard-codes stream=True for every follow-up request. That changes the event contract mid-call: a request announced as streaming=False at Lines 124-129 can still emit chunk events after the first tool round. You’ll need to thread the original stream flag into _handle_tool_calls() and add a non-stream follow-up branch; simply swapping Line 333 to stream is not enough because the rest of _generate_followup() assumes an async iterator.

Also applies to: 327-339

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 289 - 300, The code calls _handle_tool_calls(...) after
extract_tool_calls_from_hf_response(...) but forces follow-ups to stream by
hard-coding stream=True inside _generate_followup; instead thread the original
stream boolean from the initial request into _handle_tool_calls (pass it as an
extra parameter) and propagate it into _generate_followup; then implement two
follow-up branches inside _generate_followup (or the caller): one that yields an
async iterator when stream=True (existing behavior) and one that performs a
single non-stream call and returns the full response object when stream=False
(so no chunks are emitted). Update all call sites (e.g., where
_handle_tool_calls and _generate_followup are invoked) to accept and forward the
stream flag and handle the returned type accordingly.

Comment on lines +340 to +349
except (HfHubHTTPError, InferenceTimeoutError, OSError) as e:
logger.exception("Failed to get follow-up response after tool execution")
llm.events.send(
events.LLMErrorEvent(
plugin_name=plugin_name,
error_message=str(e),
event_data=e,
)
)
return LLMResponseEvent(original=None, text=""), []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve follow-up failures in the returned LLMResponseEvent.

The initial request failure path at Line 145 returns exception=e, but this branch drops the exception and returns an empty response instead. Awaiters cannot tell a tool-follow-up transport failure from a valid empty completion.

Suggested fix
-            return LLMResponseEvent(original=None, text=""), []
+            return LLMResponseEvent(original=None, text="", exception=e), []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except (HfHubHTTPError, InferenceTimeoutError, OSError) as e:
logger.exception("Failed to get follow-up response after tool execution")
llm.events.send(
events.LLMErrorEvent(
plugin_name=plugin_name,
error_message=str(e),
event_data=e,
)
)
return LLMResponseEvent(original=None, text=""), []
except (HfHubHTTPError, InferenceTimeoutError, OSError) as e:
logger.exception("Failed to get follow-up response after tool execution")
llm.events.send(
events.LLMErrorEvent(
plugin_name=plugin_name,
error_message=str(e),
event_data=e,
)
)
return LLMResponseEvent(original=None, text="", exception=e), []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 340 - 349, The except block handling (HfHubHTTPError,
InferenceTimeoutError, OSError) currently logs and emits an LLMErrorEvent but
returns LLMResponseEvent(original=None, text=""), [] which drops the exception;
change the return so the LLMResponseEvent preserves the caught exception (e.g.,
set its exception/exception_info field to e or include it in event_data) while
keeping original=None and text="" so callers can distinguish a transport failure
from a valid empty completion; update the return in that except block (the one
that calls logger.exception and llm.events.send with plugin_name and
error_message=str(e)) to include the exception on the LLMResponseEvent.

@Nash0x7E2 Nash0x7E2 merged commit 982ee4b into main Apr 2, 2026
5 checks passed
@Nash0x7E2 Nash0x7E2 deleted the refactor-hf-tool-calling branch April 2, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants