Conversation
📝 WalkthroughWalkthroughThis PR extracts shared HuggingFace tool-calling infrastructure into new modules, refactors LLM/VLM code to use that infrastructure, adds tests covering streaming and multi‑round tool execution, and includes an example demonstrating a tool-calling agent with a local Transformers model. (47 words) Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant LLM as TransformersLLM / VLM
participant HF as HuggingFace Model (API)
participant ToolLoop as Tool-Call Loop
participant Tools as Registered Tools
App->>LLM: create_response(messages, tools)
LLM->>HF: send request (includes tools schema)
HF-->>LLM: stream / non-stream response (deltas or full message)
alt finish_reason == "tool_calls" or tool markers detected
LLM->>ToolLoop: extract tool calls & run_tool_call_loop()
ToolLoop->>Tools: invoke registered async tool functions
Tools-->>ToolLoop: return tool outputs
ToolLoop->>LLM: generate_followup(messages + tool outputs)
LLM->>HF: send follow-up request
HF-->>LLM: follow-up response (may contain more tool calls)
loop Until no more tool calls or max_rounds
ToolLoop->>Tools: execute additional tool calls...
end
ToolLoop-->>App: final LLMResponseEvent (natural-language answer)
else no tool calls
LLM-->>App: LLMResponseEvent (natural-language answer)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
plugins/huggingface/tests/test_hf_tool_calling.py (2)
139-164: Prefer a tiny fakeLLMoverMagicMock/AsyncMockin these loop tests.The contract under test is small, and a fake object with real methods would catch signature drift that these loose mocks will not.
As per coding guidelines, "Never mock in tests; use pytest for testing."
Also applies to: 210-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 139 - 164, The test uses MagicMock/AsyncMock for llm (including llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature drift; replace this with a tiny fake LLM class that implements async def _dedup_and_execute(self, ...) returning the two planned side-effect tuples and def _sanitize_tool_output(self, v) returning str(v) so the test exercises real method signatures; update both occurrences (lines around the current llm setup and the similar block at 210-223) to instantiate the fake LLM instead of MagicMock/AsyncMock.
6-14: Avoid importing private helper modules directly from tests.These imports make internal module names part of the test contract, so a helper move or rename becomes a test-breaking API change. Please re-export the supported helpers or colocate these tests with the package-private modules instead of importing
_foomodules from the external test package.As per coding guidelines, "Never import from private modules (
_foo) outside of the package's own__init__.py. Use the public re-export instead."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, Tests are importing private helpers (_hf_tool_calling, _tool_call_loop) directly; instead either re-export the required symbols (accumulate_tool_call_chunk, convert_tools_to_hf_format, extract_tool_calls_from_hf_response, finalize_pending_tool_calls, run_tool_call_loop) from the package's public API (e.g., add them to vision_agents.plugins.huggingface.__init__ or another public module) and update the test imports to use that public path, or move the tests into the package-private module directory so they can import the private modules locally; ensure all references in tests use the public names rather than `_hf_tool_calling`/`_tool_call_loop`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The caller-provided kwargs are not being forwarded into
the HuggingFace request parameter dicts; update the request building in both
places where request_kwargs is constructed (the initial request in the function
and inside _generate_followup) to merge the incoming **kwargs into
request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs
when building the dict) so options like temperature/max_tokens/provider flags
are preserved; ensure you still explicitly set/override required keys like
"model" from kwargs when needed and keep the existing keys ("messages",
"stream", "tools") intact.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 95-121: The loop currently skips triples with a falsey
NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is
missing generate a stable fallback id (e.g., using the loop index or uuid4) and
use that fallback for both assistant_tool_calls and tool_results so every
executed tool produces messages; remove the early continue and ensure the code
still uses tc["name"] and tc.get("arguments_json", {}) for the
assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else
res) for tool_results, preserving the existing structures and returning
llm_response only after processing all triples.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`:
- Around line 136-145: The refactor dropped caller-supplied generation kwargs
when routing through create_hf_response; update the call site in create_response
(where effective_model is computed) to pass through the remaining kwargs (e.g.,
temperature, max_tokens, stop) by merging kwargs into the payload passed to
create_hf_response instead of only sending messages/model/stream/tools; also
apply the same merge pattern in the tool-call follow-up builder in
_hf_tool_calling.py so the tool-call requests (in the function that builds
follow-up HF request payloads) include the original caller kwargs as well.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`:
- Around line 143-165: The current success/failed branching trusts
response.exception but follow-up tool-call failures clear the exception and
return empty text, causing false VLMInferenceCompletedEvent emissions; fix by
ensuring follow-up errors propagate the exception (update the follow-up path in
_hf_tool_calling to set LLMResponseEvent.exception with the underlying error)
and make the caller here (create_response / this response handling block) treat
empty responses as failures too (change the conditional to check if
response.exception is not None OR response.text is falsy, and emit VLMErrorEvent
in that case; otherwise emit VLMInferenceCompletedEvent).
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 340-344: Detected re-entry into create_response() from inside
run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and
loses caller sampling/kwargs; instead, change the flow so that when
extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call
create_response() again. Modify _handle_tool_calls (and any helpers it uses) to
return a structured continuation (e.g., updated messages and any tool results or
a flag) back to the current run_tool_call_loop so the existing loop resumes
handling tool calls and preserves the original kwargs and max_new_tokens; remove
any calls to create_response() from within _handle_tool_calls/run_tool_call_loop
and ensure _handle_tool_calls updates messages/state and returns control to the
caller for the next iteration.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 221-229: The current simple_response flow appends the user's
plain-text turn twice by adding it into image_content and then replaying the
conversation via _build_messages(); to fix, ensure the user text is only added
once: either stop appending the plain-text entry to image_content when the last
built message is already the same user turn, or add the multimodal image_content
as the single user message and avoid replaying/duplicating that same user text
in _build_messages(); locate the multimodal assembly in simple_response
(variables frames_snapshot, image_content, messages and method _build_messages)
and implement the conditional dedupe so the user text is not duplicated when
frames_snapshot is empty.
- Around line 374-379: The auto-dispatch currently returns await
self._handle_tool_calls(...) when tool calls are detected, which causes
recursive loops and loses the caller's sampling settings (e.g., max_new_tokens);
change this to use the shared non-recursive follow-up generator instead of
awaiting _handle_tool_calls directly: in the branch that detects tool_calls
(where extract_tool_calls_from_text is used) return or yield a follow-up from
the central run_tool_call_loop (or a new helper like
_enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively
and forwards the original kwargs (preserving max_new_tokens and sampling
parameters) so run_tool_call_loop/create_response are not re-entered
recursively; also remove any direct create_response() calls from inside
run_tool_call_loop and refactor _handle_tool_calls to be invoked by the
follow-up orchestrator rather than awaited inline.
- Around line 302-305: The retry path in _build_processor_inputs that calls
apply_chat_template() without the tool schema must also clear the tools flag so
downstream logic doesn't treat the output as if tools were provided; update
_build_processor_inputs (where it retries without tools) to set tools_param =
None (or a clear boolean like tools_present=False) when it falls back, and
ensure the caller code that checks "if tools_param and output_text:" (or the
variable output_text) uses that updated flag so JSON-like model outputs aren't
misclassified as tool calls when no schema was shown.
---
Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 139-164: The test uses MagicMock/AsyncMock for llm (including
llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature
drift; replace this with a tiny fake LLM class that implements async def
_dedup_and_execute(self, ...) returning the two planned side-effect tuples and
def _sanitize_tool_output(self, v) returning str(v) so the test exercises real
method signatures; update both occurrences (lines around the current llm setup
and the similar block at 210-223) to instantiate the fake LLM instead of
MagicMock/AsyncMock.
- Around line 6-14: Tests are importing private helpers (_hf_tool_calling,
_tool_call_loop) directly; instead either re-export the required symbols
(accumulate_tool_call_chunk, convert_tools_to_hf_format,
extract_tool_calls_from_hf_response, finalize_pending_tool_calls,
run_tool_call_loop) from the package's public API (e.g., add them to
vision_agents.plugins.huggingface.__init__ or another public module) and update
the test imports to use that public path, or move the tests into the
package-private module directory so they can import the private modules locally;
ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f40713a7-b283-4df3-8695-abd2a2d4a477
⛔ Files ignored due to path filters (1)
plugins/huggingface/examples/transformers/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
plugins/huggingface/examples/transformers/pyproject.tomlplugins/huggingface/examples/transformers/transformers_tool_calling_example.pyplugins/huggingface/tests/test_hf_tool_calling.pyplugins/huggingface/tests/test_huggingface.pyplugins/huggingface/tests/test_transformers_llm.pyplugins/huggingface/tests/test_transformers_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.pyplugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
💤 Files with no reviewable changes (1)
- plugins/huggingface/examples/transformers/pyproject.toml
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
Outdated
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
Outdated
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
Outdated
Show resolved
Hide resolved
a740a8d to
ccba289
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py (1)
102-105:⚠️ Potential issue | 🟠 MajorGenerate a fallback tool-call id instead of dropping executed calls.
NormalizedToolCallItem.idis optional, but this branch discards any executed call whose provider omitted it. That means the tool runs, yet no assistant/tool messages are added for that call, and a round with only id-less calls returns the previous empty response instead of asking for the follow-up.🧩 Suggested fix
- for tc, res, err in triples: - cid = tc.get("id") - if not cid: - continue + for call_index, (tc, res, err) in enumerate(triples): + cid = tc.get("id") or f"tool_call_{round_num}_{call_index}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py` around lines 102 - 105, The loop over triples drops executed calls when tc.get("id") is missing; instead generate a fallback id (e.g., UUID or a deterministic incremental/fallback prefix) and assign it to the tool-call item so the call is not discarded. In the for-loop handling triples (variables tc, res, err) inside _tool_call_loop.py, replace the continue branch with code that creates a fallback id and sets tc["id"] (or cid) before proceeding so subsequent message creation and bookkeeping treat the call as executed; ensure the chosen id is unique within the current session/round and used wherever id is referenced later in this function.plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
583-596:⚠️ Potential issue | 🟠 MajorCarry the caller's explicit generation settings into tool follow-ups.
_generate_followup()only forwards**kwargs, butstream,max_new_tokens,temperature, anddo_sampleare explicitcreate_response()parameters, not entries in that dict. Round 2+ therefore falls back to defaults and can change sampling/output length halfway through a single response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` around lines 583 - 596, _generate_followup currently only forwards **kwargs to create_response which omits explicit generation args (stream, max_new_tokens, temperature, do_sample) causing follow-ups to use defaults; update _generate_followup (inside _handle_tool_calls) to capture and pass the caller's explicit generation settings into the create_response call by forwarding stream=stream, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=do_sample (sourced from the enclosing _handle_tool_calls parameters or kwargs) so every create_response invocation (including round 2+) uses the same sampling and length settings.plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)
116-123:⚠️ Potential issue | 🟠 MajorPreserve caller options on both HuggingFace requests.
Both payload builders are recreated from scratch, so
temperature,max_tokens,stop, provider-specific kwargs, etc. are silently dropped. The follow-up path also hardcodesstream=True, so acreate_response(..., stream=False)call can switch modes after the first tool round.Also applies to: 328-334
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py` around lines 116 - 123, The payload builders recreate request_kwargs and drop caller options (temperature, max_tokens, stop, provider-specific kwargs) and the follow-up path hardcodes stream=True; fix by starting each payload with a shallow copy/merge of the original caller options dict (preserve entries like temperature, max_tokens, stop and any provider-specific kwargs) before adding/overriding messages, model, and tools (use the existing request_kwargs variable and only set request_kwargs["tools"] = tools if tools is present), and ensure the follow-up code (the create_response path that currently forces stream=True) uses the incoming stream value instead of hardcoding True so streaming mode is preserved across rounds.
🧹 Nitpick comments (1)
plugins/huggingface/tests/test_hf_tool_calling.py (1)
6-14: Avoid binding the test suite to private HuggingFace modules.These tests import
_hf_tool_callingand_tool_call_loopdirectly from outside the package implementation, so an internal module move/rename becomes a test break instead of an encapsulated refactor. Re-export the helpers from the public package surface or cover them through public callers.As per coding guidelines, "Never import from private modules (
_foo) outside of the package's own__init__.py. Use the public re-export instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, The test imports private modules (_hf_tool_calling, _tool_call_loop) directly; change the tests to use the public package surface by re-exporting the helpers (accumulate_tool_call_chunk, convert_tools_to_hf_format, extract_tool_calls_from_hf_response, finalize_pending_tool_calls, run_tool_call_loop) from the package's public __init__.py (or another public module) and update the test imports to import these symbols from that public module instead of from _hf_tool_calling/_tool_call_loop so future internal refactors won't break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 234-253: The code emits an LLMResponseCompletedEvent even when
finish_reason == "tool_calls", causing a premature "completed" event for
intermediate tool-call turns; change the logic around the
llm.events.send(LLMResponseCompletedEvent(...)) call so it is skipped when
finish_reason == "tool_calls" (i.e., only send the completed event when
finish_reason != "tool_calls"), and apply the same change in the other streaming
branch handled around the _handle_tool_calls() path (the similar block reported
at the other location). Ensure you reference the finish_reason variable and do
not call llm.events.send for the tool_calls case so the actual follow-up answer
will be the sole completed event.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 109-112: The logger currently emits raw tool arguments/results
(variables name, args, res, err) at info/warning level; replace those calls with
redacted summaries (e.g., show tool name and a truncated/placeholder summary of
args/res) and move the full verbatim payloads to debug-only logging. Concretely,
update the logger.warning and logger.info in _tool_call_loop.py to avoid
printing args/res directly (use a short summary or truncated length and mask
possible secrets) and add logger.debug lines that log the full args/res/err for
troubleshooting.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 357-382: The code emits a synthetic LLMResponseChunkEvent
unconditionally after tool handling; change it to respect the original stream
flag by checking the incoming stream parameter (or stored attribute) before
sending the LLMResponseChunkEvent: only send the LLMResponseChunkEvent when
stream is True, otherwise skip emitting chunk events and let the non-streaming
completion path (completion-only events) run; update the block around
suppress_events/result.text/is_tool_followup/_handle_tool_calls to gate the
event send on stream so non-stream callers do not receive chunk events.
- Around line 133-147: The current regex-based fallback (json_pattern) in the
loop that extracts tool calls only handles flat objects and fails on nested
arguments; replace the regex approach in transformers_llm.py with a balanced
JSON extractor (e.g., scan the text for '{', track brace depth to find the
matching closing brace or use json.JSONDecoder().raw_decode to locate full JSON
objects) and then json.loads each balanced substring; once parsed, keep the
existing checks that the object has "name" and "arguments" and append to
tool_calls with the same keys ("type","id","name","arguments_json"); this
ensures nested "arguments" like {"filters":{"owner":"me"}} are accepted.
---
Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The payload builders recreate request_kwargs and drop
caller options (temperature, max_tokens, stop, provider-specific kwargs) and the
follow-up path hardcodes stream=True; fix by starting each payload with a
shallow copy/merge of the original caller options dict (preserve entries like
temperature, max_tokens, stop and any provider-specific kwargs) before
adding/overriding messages, model, and tools (use the existing request_kwargs
variable and only set request_kwargs["tools"] = tools if tools is present), and
ensure the follow-up code (the create_response path that currently forces
stream=True) uses the incoming stream value instead of hardcoding True so
streaming mode is preserved across rounds.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 102-105: The loop over triples drops executed calls when
tc.get("id") is missing; instead generate a fallback id (e.g., UUID or a
deterministic incremental/fallback prefix) and assign it to the tool-call item
so the call is not discarded. In the for-loop handling triples (variables tc,
res, err) inside _tool_call_loop.py, replace the continue branch with code that
creates a fallback id and sets tc["id"] (or cid) before proceeding so subsequent
message creation and bookkeeping treat the call as executed; ensure the chosen
id is unique within the current session/round and used wherever id is referenced
later in this function.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 583-596: _generate_followup currently only forwards **kwargs to
create_response which omits explicit generation args (stream, max_new_tokens,
temperature, do_sample) causing follow-ups to use defaults; update
_generate_followup (inside _handle_tool_calls) to capture and pass the caller's
explicit generation settings into the create_response call by forwarding
stream=stream, max_new_tokens=max_new_tokens, temperature=temperature,
do_sample=do_sample (sourced from the enclosing _handle_tool_calls parameters or
kwargs) so every create_response invocation (including round 2+) uses the same
sampling and length settings.
---
Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 6-14: The test imports private modules (_hf_tool_calling,
_tool_call_loop) directly; change the tests to use the public package surface by
re-exporting the helpers (accumulate_tool_call_chunk,
convert_tools_to_hf_format, extract_tool_calls_from_hf_response,
finalize_pending_tool_calls, run_tool_call_loop) from the package's public
__init__.py (or another public module) and update the test imports to import
these symbols from that public module instead of from
_hf_tool_calling/_tool_call_loop so future internal refactors won't break tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 174992df-5217-4cd7-ae4f-de9210da383f
📒 Files selected for processing (12)
plugins/huggingface/examples/transformers/transformers_mcp_tool_calling_example.pyplugins/huggingface/examples/transformers/transformers_tool_calling_example.pyplugins/huggingface/tests/test_hf_tool_calling.pyplugins/huggingface/tests/test_huggingface.pyplugins/huggingface/tests/test_transformers_llm.pyplugins/huggingface/tests/test_transformers_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.pyplugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/huggingface/tests/test_transformers_vlm.py
- plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
Outdated
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
Outdated
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
380-380:⚠️ Potential issue | 🟠 MajorPropagate the original generation options into tool follow-ups.
_generate_followup()only forwardskwargs, so follow-up rounds fall back to defaultstream=True,max_new_tokens=None,temperature=0.7, anddo_sample=True. That reintroduces chunk events for non-stream callers and changes sampling/token budgets after the first tool round.🧩 Proposed fix
- return await self._handle_tool_calls(tool_calls, messages, kwargs) + return await self._handle_tool_calls( + tool_calls, + messages, + stream=stream, + max_new_tokens=max_new_tokens, + temperature=temperature, + do_sample=do_sample, + kwargs=kwargs, + ) @@ async def _handle_tool_calls( self, tool_calls: list[NormalizedToolCallItem], messages: list[dict[str, Any]], - kwargs: dict[str, Any], + *, + stream: bool, + max_new_tokens: int | None, + temperature: float, + do_sample: bool, + kwargs: dict[str, Any], ) -> LLMResponseEvent: @@ result = await self.create_response( - messages=current_messages, _tool_followup=True, **kwargs + messages=current_messages, + stream=stream, + max_new_tokens=max_new_tokens, + temperature=temperature, + do_sample=do_sample, + _tool_followup=True, + **kwargs, )Also applies to: 599-614
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` at line 380, The follow-up generation path in _generate_followup is dropping the caller's original generation options (stream, max_new_tokens, temperature, do_sample, etc.), causing subsequent tool rounds to revert to defaults; update the flow so the original generation options are captured from the initial call (kwargs passed into the first generation) and merged/preserved when invoking _generate_followup and _handle_tool_calls (e.g., accept an explicit gen_options parameter or forward the original kwargs through the call chain), ensuring _generate_followup uses those merged options instead of defaults so streaming, sampling, and token limits remain consistent across tool follow-ups.plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py (1)
107-110:⚠️ Potential issue | 🟠 MajorRedact tool payloads in normal logs.
args,res, anderrcan contain user data or large blobs. Logging them verbatim at info/warning leaks sensitive content into routine logs and can explode log volume.🪵 Suggested logging change
if err is not None: - logger.warning(" [tool] %s(%s) failed: %s", name, args, err) + logger.warning(" [tool] %s failed", name) + logger.debug(" [tool] %s args=%r err=%r", name, args, err) else: - logger.info(" [tool] %s(%s) → %s", name, args, res) + logger.info(" [tool] %s succeeded", name) + logger.debug(" [tool] %s args=%r result=%r", name, args, res)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py` around lines 107 - 110, The current logging in the tool call loop logs raw payloads (args, res, err) which may contain sensitive or large user data; update the logger usage in the function/method containing the logger calls so that only non-sensitive metadata is logged (e.g., tool name, argument/res/err types, lengths/sizes, and a redacted placeholder like "<redacted>") instead of the full args/res/err values; for errors include err type/message truncated or a hashed identifier rather than full content and preserve existing logger levels (logger.warning for errors, logger.info for successes) while removing direct dumps of args/res/err to avoid leaking data or exploding log volume.
🧹 Nitpick comments (1)
plugins/huggingface/tests/test_transformers_llm.py (1)
301-304: Avoidhasattr()in this assertion.You already split chunk and completed events above, so this can use direct attribute access and stay simpler.
♻️ Proposed change
- for evt in chunk_events + completed_events: - assert "<tool_call>" not in ( - evt.delta if hasattr(evt, "delta") else evt.text - ) + for evt in chunk_events: + assert "<tool_call>" not in evt.delta + for evt in completed_events: + assert "<tool_call>" not in evt.textAs per coding guidelines, "Avoid using
getattr,hasattr,delattr,setattr; prefer normal attribute access in Python".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_llm.py` around lines 301 - 304, The assertion currently uses hasattr on evt; instead iterate the already-split lists directly: for each evt in chunk_events assert "<tool_call>" not in evt.delta, and for each evt in completed_events assert "<tool_call>" not in evt.text. Update the loop that currently iterates over chunk_events + completed_events (variables chunk_events, completed_events, and evt) to two explicit loops using direct attribute access instead of hasattr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 77-80: The code caches/returns a raw tool-call payload (e.g.,
follow_up containing "<tool_call>...</tool_call>") as the fallback LLM response;
update the logic around llm_response, follow_up and next_tool_calls (the
_tool_followup branch) so that whenever next_tool_calls is non-empty you do NOT
assign or persist the raw tool-call text to llm_response.text or return values:
instead strip or replace the tool-call markup with a safe placeholder or empty
final-answer text and only store the sanitized/terminal response; ensure the
same change is applied to the other similar blocks that touch follow_up and
llm_response (references: llm_response: LLMResponseEvent, follow_up,
next_tool_calls, _tool_followup) so tool markup never leaks back to the caller.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 140-159: The parser currently injects synthetic ids (uuid4) when
an extracted tool-call lacks an "id", which breaks deduping in
run_tool_call_loop; change both construction sites in transformers_llm.py so
they do NOT generate a fallback uuid: use obj.get("id") (or only include the
"id" key when obj contains it) instead of obj.get("id", str(uuid.uuid4())) and
remove the str(uuid.uuid4()) used in the later loop; keep all other fields
("type","name","arguments_json") unchanged so missing ids remain missing and
run_tool_call_loop can synthesize stable fallback ids.
---
Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 107-110: The current logging in the tool call loop logs raw
payloads (args, res, err) which may contain sensitive or large user data; update
the logger usage in the function/method containing the logger calls so that only
non-sensitive metadata is logged (e.g., tool name, argument/res/err types,
lengths/sizes, and a redacted placeholder like "<redacted>") instead of the full
args/res/err values; for errors include err type/message truncated or a hashed
identifier rather than full content and preserve existing logger levels
(logger.warning for errors, logger.info for successes) while removing direct
dumps of args/res/err to avoid leaking data or exploding log volume.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Line 380: The follow-up generation path in _generate_followup is dropping the
caller's original generation options (stream, max_new_tokens, temperature,
do_sample, etc.), causing subsequent tool rounds to revert to defaults; update
the flow so the original generation options are captured from the initial call
(kwargs passed into the first generation) and merged/preserved when invoking
_generate_followup and _handle_tool_calls (e.g., accept an explicit gen_options
parameter or forward the original kwargs through the call chain), ensuring
_generate_followup uses those merged options instead of defaults so streaming,
sampling, and token limits remain consistent across tool follow-ups.
---
Nitpick comments:
In `@plugins/huggingface/tests/test_transformers_llm.py`:
- Around line 301-304: The assertion currently uses hasattr on evt; instead
iterate the already-split lists directly: for each evt in chunk_events assert
"<tool_call>" not in evt.delta, and for each evt in completed_events assert
"<tool_call>" not in evt.text. Update the loop that currently iterates over
chunk_events + completed_events (variables chunk_events, completed_events, and
evt) to two explicit loops using direct attribute access instead of hasattr.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 653428c8-5007-45fd-bb3a-d283f1afa361
📒 Files selected for processing (3)
plugins/huggingface/tests/test_transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
Show resolved
Hide resolved
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)
116-123:⚠️ Potential issue | 🟠 MajorForward caller kwargs into both HF request builders.
create_hf_response()accepts**kwargs, but the dicts at Lines 116-123 and 330-336 never merge them. Line 332 already consultskwargs.get("model"), so generation options and provider flags are silently dropped, andmodelcan diverge between the initial and follow-up turns.Suggested fix
request_kwargs: dict[str, Any] = { + **kwargs, "messages": messages, - "model": model_id, + "model": kwargs.get("model", model_id), "stream": stream, } if tools: request_kwargs["tools"] = toolsrequest_kwargs: dict[str, Any] = { + **kwargs, "messages": current_messages, "model": kwargs.get("model", model_id), "stream": True, }Also applies to: 330-336
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py` around lines 116 - 123, The request kwargs built for HF calls in create_hf_response are not merging the passed-in **kwargs so generation options and provider flags (and an overriding model) are dropped; update the request_kwargs construction in create_hf_response and the corresponding follow-up builder to merge in kwargs (e.g., request_kwargs.update(kwargs) or selectively copy relevant keys), ensure kwargs.get("model") overrides the local model_id consistently, and preserve existing keys like "messages", "stream", and "tools" while allowing generation/provider flags from **kwargs to be forwarded to the HF request builders.
🧹 Nitpick comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py (1)
33-35: Tighten the shared helper types.This module sits on the shared HF/plugin boundary, but these signatures still use
Anyfor chunks, responses, and request kwargs. Concrete HF types—or smallProtocol/TypedDictwrappers if the hub types are too loose—would let static checking catch missingchoices,delta, andtool_callsmembers here.As per coding guidelines, "Avoid using
Anytype in type annotations" and "Use type annotations everywhere with modern syntax:X | Yunions,dict[str, T]generics, fullCallablesignatures,Optionalfor nullable params".Also applies to: 72-74, 99-108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py` around lines 33 - 35, The helper functions (e.g., accumulate_tool_call_chunk) use broad Any types for chunks, responses, and request kwargs; tighten these by defining small Protocols or TypedDicts that declare the exact members you access (e.g., choices, delta, tool_calls, etc.) and use them in signatures instead of Any (use modern syntax like dict[int, ToolChunkProtocol], Optional[...], or X | Y for unions); update all affected functions referenced in this module to accept those concrete types so static checkers can verify access to choices/delta/tool_calls and avoid Any usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 289-300: The code calls _handle_tool_calls(...) after
extract_tool_calls_from_hf_response(...) but forces follow-ups to stream by
hard-coding stream=True inside _generate_followup; instead thread the original
stream boolean from the initial request into _handle_tool_calls (pass it as an
extra parameter) and propagate it into _generate_followup; then implement two
follow-up branches inside _generate_followup (or the caller): one that yields an
async iterator when stream=True (existing behavior) and one that performs a
single non-stream call and returns the full response object when stream=False
(so no chunks are emitted). Update all call sites (e.g., where
_handle_tool_calls and _generate_followup are invoked) to accept and forward the
stream flag and handle the returned type accordingly.
- Around line 340-349: The except block handling (HfHubHTTPError,
InferenceTimeoutError, OSError) currently logs and emits an LLMErrorEvent but
returns LLMResponseEvent(original=None, text=""), [] which drops the exception;
change the return so the LLMResponseEvent preserves the caught exception (e.g.,
set its exception/exception_info field to e or include it in event_data) while
keeping original=None and text="" so callers can distinguish a transport failure
from a valid empty completion; update the return in that except block (the one
that calls logger.exception and llm.events.send with plugin_name and
error_message=str(e)) to include the exception on the LLMResponseEvent.
---
Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The request kwargs built for HF calls in
create_hf_response are not merging the passed-in **kwargs so generation options
and provider flags (and an overriding model) are dropped; update the
request_kwargs construction in create_hf_response and the corresponding
follow-up builder to merge in kwargs (e.g., request_kwargs.update(kwargs) or
selectively copy relevant keys), ensure kwargs.get("model") overrides the local
model_id consistently, and preserve existing keys like "messages", "stream", and
"tools" while allowing generation/provider flags from **kwargs to be forwarded
to the HF request builders.
---
Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 33-35: The helper functions (e.g., accumulate_tool_call_chunk) use
broad Any types for chunks, responses, and request kwargs; tighten these by
defining small Protocols or TypedDicts that declare the exact members you access
(e.g., choices, delta, tool_calls, etc.) and use them in signatures instead of
Any (use modern syntax like dict[int, ToolChunkProtocol], Optional[...], or X |
Y for unions); update all affected functions referenced in this module to accept
those concrete types so static checkers can verify access to
choices/delta/tool_calls and avoid Any usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d12c9616-00df-4cdf-978b-4555f375317a
📒 Files selected for processing (1)
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
| tool_calls = extract_tool_calls_from_hf_response(response) | ||
| if tool_calls: | ||
| return await _handle_tool_calls( | ||
| llm, | ||
| client, | ||
| model_id, | ||
| plugin_name, | ||
| tool_calls, | ||
| messages, | ||
| tools, | ||
| kwargs, | ||
| ) |
There was a problem hiding this comment.
Honor the original stream mode after tool execution.
Lines 289-300 can reach _handle_tool_calls() from the non-stream path, but Line 333 hard-codes stream=True for every follow-up request. That changes the event contract mid-call: a request announced as streaming=False at Lines 124-129 can still emit chunk events after the first tool round. You’ll need to thread the original stream flag into _handle_tool_calls() and add a non-stream follow-up branch; simply swapping Line 333 to stream is not enough because the rest of _generate_followup() assumes an async iterator.
Also applies to: 327-339
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 289 - 300, The code calls _handle_tool_calls(...) after
extract_tool_calls_from_hf_response(...) but forces follow-ups to stream by
hard-coding stream=True inside _generate_followup; instead thread the original
stream boolean from the initial request into _handle_tool_calls (pass it as an
extra parameter) and propagate it into _generate_followup; then implement two
follow-up branches inside _generate_followup (or the caller): one that yields an
async iterator when stream=True (existing behavior) and one that performs a
single non-stream call and returns the full response object when stream=False
(so no chunks are emitted). Update all call sites (e.g., where
_handle_tool_calls and _generate_followup are invoked) to accept and forward the
stream flag and handle the returned type accordingly.
| except (HfHubHTTPError, InferenceTimeoutError, OSError) as e: | ||
| logger.exception("Failed to get follow-up response after tool execution") | ||
| llm.events.send( | ||
| events.LLMErrorEvent( | ||
| plugin_name=plugin_name, | ||
| error_message=str(e), | ||
| event_data=e, | ||
| ) | ||
| ) | ||
| return LLMResponseEvent(original=None, text=""), [] |
There was a problem hiding this comment.
Preserve follow-up failures in the returned LLMResponseEvent.
The initial request failure path at Line 145 returns exception=e, but this branch drops the exception and returns an empty response instead. Awaiters cannot tell a tool-follow-up transport failure from a valid empty completion.
Suggested fix
- return LLMResponseEvent(original=None, text=""), []
+ return LLMResponseEvent(original=None, text="", exception=e), []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (HfHubHTTPError, InferenceTimeoutError, OSError) as e: | |
| logger.exception("Failed to get follow-up response after tool execution") | |
| llm.events.send( | |
| events.LLMErrorEvent( | |
| plugin_name=plugin_name, | |
| error_message=str(e), | |
| event_data=e, | |
| ) | |
| ) | |
| return LLMResponseEvent(original=None, text=""), [] | |
| except (HfHubHTTPError, InferenceTimeoutError, OSError) as e: | |
| logger.exception("Failed to get follow-up response after tool execution") | |
| llm.events.send( | |
| events.LLMErrorEvent( | |
| plugin_name=plugin_name, | |
| error_message=str(e), | |
| event_data=e, | |
| ) | |
| ) | |
| return LLMResponseEvent(original=None, text="", exception=e), [] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 340 - 349, The except block handling (HfHubHTTPError,
InferenceTimeoutError, OSError) currently logs and emits an LLMErrorEvent but
returns LLMResponseEvent(original=None, text=""), [] which drops the exception;
change the return so the LLMResponseEvent preserves the caught exception (e.g.,
set its exception/exception_info field to e or include it in event_data) while
keeping original=None and text="" so callers can distinguish a transport failure
from a valid empty completion; update the return in that except block (the one
that calls logger.exception and llm.events.send with plugin_name and
error_message=str(e)) to include the exception on the LLMResponseEvent.
Summary by CodeRabbit
New Features
Examples
Tests