feat: 允许 LLM 预览工具返回的图片并自主决定是否发送#4895
Conversation
There was a problem hiding this comment.
Hey - 我发现了 4 个问题,并给出了一些整体性的反馈:
- 在
ToolImageCache.get_image中,回退时扫描的扩展名只覆盖了.png/.jpg/.gif/.webp/.bmp,并且通过image/{ext[1:]}来重建 MIME 类型,这和_get_file_extension的行为不一致(例如不支持.jpeg和.svg,而.jpg会变成image/jpg而不是image/jpeg)。建议在保存和读取两个路径上复用同一个扩展名/MIME 映射,以避免不一致。 step/_handle_function_tools流程在异步生成器中使用了一个魔法元组形式("cached_image", cached_img)来在各处传递图片元数据;可以考虑引入一个小的专用类型(例如 dataclass 或 TypedDict),替代这种带标签的元组,从而让这个扩展点更健壮、更具自说明性。
给 AI Agent 的提示词
请根据这次代码评审中的评论进行修改:
## 整体评论
- 在 `ToolImageCache.get_image` 中,回退时扫描的扩展名只覆盖了 `.png/.jpg/.gif/.webp/.bmp`,并且通过 `image/{ext[1:]}` 来重建 MIME 类型,这和 `_get_file_extension` 的行为不一致(例如不支持 `.jpeg` 和 `.svg`,而 `.jpg` 会变成 `image/jpg` 而不是 `image/jpeg`)。建议在保存和读取两个路径上复用同一个扩展名/MIME 映射,以避免不一致。
- `step` / `_handle_function_tools` 流程在异步生成器中使用了一个魔法元组形式 `("cached_image", cached_img)` 来在各处传递图片元数据;可以考虑引入一个小的专用类型(例如 dataclass 或 TypedDict),替代这种带标签的元组,从而让这个扩展点更健壮、更具自说明性。
## 单独评论
### 评论 1
<location> `astrbot/core/agent/tool_image_cache.py:132` </location>
<code_context>
+ return cached
+
+ # Try to find the file directly if not in memory cache
+ for ext in [".png", ".jpg", ".gif", ".webp", ".bmp"]:
+ file_path = os.path.join(self._cache_dir, f"{image_ref}{ext}")
+ if os.path.exists(file_path):
</code_context>
<issue_to_address>
**issue (bug_risk):** 在探测缓存文件时加入 `.jpeg` 和 `.svg`,以避免重启后查找失败。
这里的回退搜索只检查 `['.png', '.jpg', '.gif', '.webp', '.bmp']`,但 `_get_file_extension` 也可能返回 `.jpeg` 和 `.svg`。在重启后,这些文件将无法在磁盘上被找到,也就无法被服务。请将 `.jpeg` 和 `.svg` 包含进这个列表(或者根据 `_get_file_extension` 的映射来派生该列表),以确保所有已保存的图片都可以恢复。
</issue_to_address>
### 评论 2
<location> `astrbot/core/astr_main_agent_resources.py:436-437` </location>
<code_context>
+ except Exception as e:
+ errors.append(f"Failed to load image '{image_ref}': {e}")
+
+ if not components:
+ return f"error: No valid images to send. Errors: {'; '.join(errors)}"
+
+ # Send the images
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 明确仅提供 caption 但所有 image_ref 都无效时的行为。
由于 caption 会被加入到 `components` 中,即使所有 `image_ref` 都加载失败且没有任何图片被发送,这个工具仍然会报告成功。可以考虑:当存在任何 `image_ref` 且 `sent_count == 0` 时将其视作错误,或者调整返回消息,显式说明只发送了 caption 而没有发送任何图片,以免下游消费者和日志误解结果。
建议实现:
```python
# If we have nothing at all to send (no caption, no images), bail out
if not components:
return f"error: No valid images to send. Errors: {'; '.join(errors)}"
# If a caption was provided but all image_refs failed to load, treat this as an error
# so downstream consumers don't misinterpret the result as images being delivered.
if sent_count == 0 and errors:
return (
"error: All provided image references failed to load; "
"no images were sent. "
f"Errors: {'; '.join(errors)}"
)
# Send the images
```
这个改动基于以下假设:
1. `sent_count` 在处理图片的循环之前已经初始化(例如 `sent_count = 0`)。
2. `errors` 是一个用于收集 image ref 加载失败的列表。
如果 `errors` 里可能包含与图片无关的问题,你可能需要进一步收窄触发这个分支的条件,只在图片加载失败时进入(例如再维护一个单独的 `image_ref_errors` 列表,或者在循环中设置一个布尔标志 `had_image_ref_failures`)。
</issue_to_address>
### 评论 3
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:287-290` </location>
<code_context>
tool_call_result_blocks = []
+ cached_images = [] # Collect cached images for LLM visibility
async for result in self._handle_function_tools(self.req, llm_resp):
if isinstance(result, list):
tool_call_result_blocks = result
+ elif isinstance(result, tuple) and result[0] == "cached_image":
+ # Collect cached image info
+ cached_images.append(result[1])
</code_context>
<issue_to_address>
**suggestion:** 直接使用原始元组作为缓存图片事件的判别字段会让生成器协议变得脆弱。
`_handle_function_tools` 现在会产出三种不同形状的 payload,并且针对缓存图片使用了一个带 magic string 的元组判别:`("cached_image", cached_img)`。如果今后再增加更多元组形态的事件,这种方式会很脆弱。建议引入一个小的封装类型(例如 `CachedImageEvent`),或者至少使用带类型的 `NamedTuple` / `Enum` 作为判别字段,这样 `step()` 中的模式匹配会更安全,也更利于类型检查工具。
建议实现:
```python
from typing import Any, NamedTuple
from astrbot.core.agent.message import ImageURLPart, TextPart, ThinkPart
from astrbot.core.agent.tool import ToolSet
from astrbot.core.agent.tool_image_cache import tool_image_cache
from astrbot.core.message.components import Json
from astrbot.core.message.message_event_result import (
MessageChain,
```
```python
llm_resp, _ = await self._resolve_tool_exec(llm_resp)
class CachedImageEvent(NamedTuple):
"""Wrapper type for cached image events yielded by _handle_function_tools."""
image: Any
tool_call_result_blocks = []
cached_images = [] # Collect cached images for LLM visibility
async for result in self._handle_function_tools(self.req, llm_resp):
if isinstance(result, list):
tool_call_result_blocks = result
elif isinstance(result, CachedImageEvent):
# Collect cached image info
cached_images.append(result.image)
elif isinstance(result, MessageChain):
if result.type is None:
# should not happen
```
要完整实现这个重构,你还需要:
1. 更新 `_handle_function_tools`(在本文件或其定义所在文件内),让它产出 `CachedImageEvent(image=cached_img)`,而不是当前的 `("cached_image", cached_img)` 元组。
2. 如果 `_handle_function_tools` 在其他地方也被使用,需要同步调整所有目前依赖 `("cached_image", ...)` 元组的消费者,让它们改为检查 `CachedImageEvent`,并使用 `.image` 属性。
3. 如果你打算在多个方法或模块中复用这个类型,可以考虑将 `CachedImageEvent` 提升到模块级(方法外部),即移除内部类定义,在顶层定义 `class CachedImageEvent(NamedTuple): image: Any`。
</issue_to_address>
### 评论 4
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:359` </location>
<code_context>
self,
req: ProviderRequest,
llm_response: LLMResponse,
- ) -> T.AsyncGenerator[MessageChain | list[ToolCallMessageSegment], None]:
+ ) -> T.AsyncGenerator[
+ MessageChain | list[ToolCallMessageSegment] | tuple[str, T.Any], None
</code_context>
<issue_to_address>
**issue (complexity):** 可以考虑引入一个带类型的 ToolLoopEvent,并将追加缓存图片的逻辑提取到一个辅助方法中,在不改变行为的前提下简化控制流并提升类型清晰度。
你可以通过引入一个带类型的事件封装,并抽取追加图片的逻辑,在不改变行为的情况下降低新增复杂度。
### 1. 用带类型的事件替代非同构的生成器输出
与其返回 `MessageChain | list[ToolCallMessageSegment] | tuple[str, Any]`,不如定义一个小的事件类型:
```python
from dataclasses import dataclass
from typing import Literal
@dataclass
class ToolLoopEvent:
kind: Literal["tool_results", "message", "cached_image"]
payload: T.Any # 或者如果你愿意,可以使用更精确的联合类型
```
然后更新 `_handle_function_tools`,改为产出 `ToolLoopEvent`:
```python
async def _handle_function_tools(
self,
req: ProviderRequest,
llm_response: LLMResponse,
) -> T.AsyncGenerator[ToolLoopEvent, None]:
...
# 以前是:`tool_call_result_blocks = result`
yield ToolLoopEvent(kind="tool_results", payload=tool_call_result_blocks)
# 以前是:`yield MessageChain(...)`
yield ToolLoopEvent(kind="message", payload=MessageChain(...))
# 以前是:`yield ("cached_image", cached_img)`
yield ToolLoopEvent(kind="cached_image", payload=cached_img)
```
这样 `step` 就变成了一个简单、带类型的分发逻辑:
```python
tool_call_result_blocks = []
cached_images = []
async for event in self._handle_function_tools(self.req, llm_resp):
if event.kind == "tool_results":
tool_call_result_blocks = event.payload
elif event.kind == "cached_image":
cached_images.append(event.payload)
elif event.kind == "message":
result = event.payload
if result.type is None:
continue
ar_type = "tool_call_result" if result.type == "tool_direct_result" else result.type
yield AgentResponse(
type=ar_type,
result=MessageEventResult(
type=ar_type,
inputs=self.req.to_message_event_inputs(),
chain=result,
),
)
```
这既移除了元组中的 magic 字符串 `"cached_image"`,又保留了所有现有行为。
### 2. 从 `step` 中提取追加缓存图片的逻辑
目前检查 `modalities`、获取 base64,并追加用户消息的代码块可以抽到一个辅助方法中,使 `step` 本身更专注:
```python
def _append_cached_images_to_context(self, cached_images: list[CachedImage]) -> None:
if not cached_images:
return
modalities = self.provider.provider_config.get("modalities", [])
if "image" not in modalities:
return
image_parts = []
for cached_img in cached_images:
img_data = tool_image_cache.get_image_base64(cached_img.image_ref)
if not img_data:
continue
base64_data, mime_type = img_data
image_parts.append(
TextPart(
text=(
f"[Image from tool '{cached_img.tool_name}', "
f"ref='{cached_img.image_ref}']"
)
)
)
image_parts.append(
ImageURLPart(
image_url=ImageURLPart.ImageURL(
url=f"data:{mime_type};base64,{base64_data}",
id=cached_img.image_ref,
)
)
)
if image_parts:
self.run_context.messages.append(
Message(role="user", content=image_parts)
)
logger.debug(
f"Appended {len(cached_images)} cached image(s) to context for LLM review"
)
```
这样 `step` 中只需要:
```python
# 在更新完 tool_calls_result 和 run_context.messages 之后
self._append_cached_images_to_context(cached_images)
self.req.append_tool_calls_result(tool_calls_result)
```
在保持全部现有功能(缓存、上下文追加、日志记录)的同时,让控制流和类型更加清晰、易于扩展。
</issue_to_address>请帮我变得更有用!欢迎对每条评论点 👍 或 👎,我会根据反馈改进评审质量。
Original comment in English
Hey - I've found 4 issues, and left some high level feedback:
- In
ToolImageCache.get_image, the fallback extension scan only covers.png/.jpg/.gif/.webp/.bmpand reconstructs MIME types asimage/{ext[1:]}, which is inconsistent with_get_file_extension(e.g.,.jpegand.svgare unsupported and.jpgbecomesimage/jpginstead ofimage/jpeg); consider reusing a single extension/MIME map for both save and load paths to avoid mismatches. - The
step/_handle_function_toolsflow uses a magic tuple form("cached_image", cached_img)in the async generator to pass image metadata around; introducing a small dedicated type (e.g., a dataclass or TypedDict) instead of a tagged tuple would make this extension point more robust and self-documenting.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `ToolImageCache.get_image`, the fallback extension scan only covers `.png/.jpg/.gif/.webp/.bmp` and reconstructs MIME types as `image/{ext[1:]}`, which is inconsistent with `_get_file_extension` (e.g., `.jpeg` and `.svg` are unsupported and `.jpg` becomes `image/jpg` instead of `image/jpeg`); consider reusing a single extension/MIME map for both save and load paths to avoid mismatches.
- The `step`/`_handle_function_tools` flow uses a magic tuple form `("cached_image", cached_img)` in the async generator to pass image metadata around; introducing a small dedicated type (e.g., a dataclass or TypedDict) instead of a tagged tuple would make this extension point more robust and self-documenting.
## Individual Comments
### Comment 1
<location> `astrbot/core/agent/tool_image_cache.py:132` </location>
<code_context>
+ return cached
+
+ # Try to find the file directly if not in memory cache
+ for ext in [".png", ".jpg", ".gif", ".webp", ".bmp"]:
+ file_path = os.path.join(self._cache_dir, f"{image_ref}{ext}")
+ if os.path.exists(file_path):
</code_context>
<issue_to_address>
**issue (bug_risk):** Include `.jpeg` and `.svg` when probing for cached files to avoid lookup failures after restart.
The fallback search here only checks `['.png', '.jpg', '.gif', '.webp', '.bmp']`, but `_get_file_extension` can also produce `.jpeg` and `.svg`. After a restart, those files won’t be found on disk and thus won’t be served. Please include `.jpeg` and `.svg` in this list (or derive the list from `_get_file_extension`’s mapping) so all saved images are recoverable.
</issue_to_address>
### Comment 2
<location> `astrbot/core/astr_main_agent_resources.py:436-437` </location>
<code_context>
+ except Exception as e:
+ errors.append(f"Failed to load image '{image_ref}': {e}")
+
+ if not components:
+ return f"error: No valid images to send. Errors: {'; '.join(errors)}"
+
+ # Send the images
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Clarify behavior when only a caption is provided but all image_refs are invalid.
Because the caption is added to `components`, the tool reports success even when all `image_ref`s fail and no images are actually sent. Consider either treating `sent_count == 0` as an error when any `image_ref`s were provided, or adjusting the message to explicitly state that only the caption was sent and no images were delivered, so downstream consumers and logs don’t misinterpret the outcome.
Suggested implementation:
```python
# If we have nothing at all to send (no caption, no images), bail out
if not components:
return f"error: No valid images to send. Errors: {'; '.join(errors)}"
# If a caption was provided but all image_refs failed to load, treat this as an error
# so downstream consumers don't misinterpret the result as images being delivered.
if sent_count == 0 and errors:
return (
"error: All provided image references failed to load; "
"no images were sent. "
f"Errors: {'; '.join(errors)}"
)
# Send the images
```
This change assumes:
1. `sent_count` is initialized before the image-processing loop (e.g. `sent_count = 0`).
2. `errors` is a list collecting any failures for image refs.
If `errors` can contain non-image-related issues, you may want to further narrow the condition to only treat image-loading failures as triggering this block (for example by tracking a separate `image_ref_errors` list or a boolean flag like `had_image_ref_failures` inside the loop).
</issue_to_address>
### Comment 3
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:287-290` </location>
<code_context>
tool_call_result_blocks = []
+ cached_images = [] # Collect cached images for LLM visibility
async for result in self._handle_function_tools(self.req, llm_resp):
if isinstance(result, list):
tool_call_result_blocks = result
+ elif isinstance(result, tuple) and result[0] == "cached_image":
+ # Collect cached image info
+ cached_images.append(result[1])
</code_context>
<issue_to_address>
**suggestion:** Using a raw tuple discriminator for cached image events makes the generator protocol brittle.
`_handle_function_tools` is now yielding three payload shapes and uses a magic-string tuple discriminator for cached images: `("cached_image", cached_img)`. This will be fragile if more tuple-shaped events are added. Consider introducing a small wrapper type (e.g. `CachedImageEvent`), or at least a typed `NamedTuple`/`Enum` for the discriminator, so `step()` pattern matching is safer and better supported by type-checkers.
Suggested implementation:
```python
from typing import Any, NamedTuple
from astrbot.core.agent.message import ImageURLPart, TextPart, ThinkPart
from astrbot.core.agent.tool import ToolSet
from astrbot.core.agent.tool_image_cache import tool_image_cache
from astrbot.core.message.components import Json
from astrbot.core.message.message_event_result import (
MessageChain,
```
```python
llm_resp, _ = await self._resolve_tool_exec(llm_resp)
class CachedImageEvent(NamedTuple):
"""Wrapper type for cached image events yielded by _handle_function_tools."""
image: Any
tool_call_result_blocks = []
cached_images = [] # Collect cached images for LLM visibility
async for result in self._handle_function_tools(self.req, llm_resp):
if isinstance(result, list):
tool_call_result_blocks = result
elif isinstance(result, CachedImageEvent):
# Collect cached image info
cached_images.append(result.image)
elif isinstance(result, MessageChain):
if result.type is None:
# should not happen
```
To fully implement this refactor you should also:
1. Update `_handle_function_tools` (in this file or wherever it is defined) so that it yields `CachedImageEvent(image=cached_img)` instead of the current `("cached_image", cached_img)` tuple.
2. If `_handle_function_tools` is used elsewhere, adjust any other consumers that currently expect a `("cached_image", ...)` tuple to instead check for `CachedImageEvent` and use the `.image` attribute.
3. Consider moving `CachedImageEvent` to module scope (outside the method) if you want to reuse the type in multiple methods or modules; in that case, remove the nested class and define `class CachedImageEvent(NamedTuple): image: Any` at the top level.
</issue_to_address>
### Comment 4
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:359` </location>
<code_context>
self,
req: ProviderRequest,
llm_response: LLMResponse,
- ) -> T.AsyncGenerator[MessageChain | list[ToolCallMessageSegment], None]:
+ ) -> T.AsyncGenerator[
+ MessageChain | list[ToolCallMessageSegment] | tuple[str, T.Any], None
</code_context>
<issue_to_address>
**issue (complexity):** Consider introducing a typed ToolLoopEvent and extracting the cached-image appending logic into a helper method to simplify control flow and improve type clarity without changing behavior.
You can reduce the new complexity without changing behavior by introducing a typed event wrapper and extracting the image-append logic.
### 1. Replace heterogeneous generator output with a typed event
Instead of yielding `MessageChain | list[ToolCallMessageSegment] | tuple[str, Any]`, define a small event type:
```python
from dataclasses import dataclass
from typing import Literal
@dataclass
class ToolLoopEvent:
kind: Literal["tool_results", "message", "cached_image"]
payload: T.Any # or a more precise union if you prefer
```
Update `_handle_function_tools` to yield `ToolLoopEvent`:
```python
async def _handle_function_tools(
self,
req: ProviderRequest,
llm_response: LLMResponse,
) -> T.AsyncGenerator[ToolLoopEvent, None]:
...
# when you previously had: `tool_call_result_blocks = result`
yield ToolLoopEvent(kind="tool_results", payload=tool_call_result_blocks)
# when you previously had: `yield MessageChain(...)`
yield ToolLoopEvent(kind="message", payload=MessageChain(...))
# when you previously had: `yield ("cached_image", cached_img)`
yield ToolLoopEvent(kind="cached_image", payload=cached_img)
```
Then `step` becomes a simple, typed dispatch:
```python
tool_call_result_blocks = []
cached_images = []
async for event in self._handle_function_tools(self.req, llm_resp):
if event.kind == "tool_results":
tool_call_result_blocks = event.payload
elif event.kind == "cached_image":
cached_images.append(event.payload)
elif event.kind == "message":
result = event.payload
if result.type is None:
continue
ar_type = "tool_call_result" if result.type == "tool_direct_result" else result.type
yield AgentResponse(
type=ar_type,
result=MessageEventResult(
type=ar_type,
inputs=self.req.to_message_event_inputs(),
chain=result,
),
)
```
This removes the magic `"cached_image"` tuple tag and preserves all existing behavior.
### 2. Extract cached-image appending logic from `step`
The block that inspects `modalities`, fetches base64, and appends a user message can be moved into a helper to keep `step` focused:
```python
def _append_cached_images_to_context(self, cached_images: list[CachedImage]) -> None:
if not cached_images:
return
modalities = self.provider.provider_config.get("modalities", [])
if "image" not in modalities:
return
image_parts = []
for cached_img in cached_images:
img_data = tool_image_cache.get_image_base64(cached_img.image_ref)
if not img_data:
continue
base64_data, mime_type = img_data
image_parts.append(
TextPart(
text=(
f"[Image from tool '{cached_img.tool_name}', "
f"ref='{cached_img.image_ref}']"
)
)
)
image_parts.append(
ImageURLPart(
image_url=ImageURLPart.ImageURL(
url=f"data:{mime_type};base64,{base64_data}",
id=cached_img.image_ref,
)
)
)
if image_parts:
self.run_context.messages.append(
Message(role="user", content=image_parts)
)
logger.debug(
f"Appended {len(cached_images)} cached image(s) to context for LLM review"
)
```
`step` then only needs:
```python
# after tool_calls_result and run_context.messages are updated
self._append_cached_images_to_context(cached_images)
self.req.append_tool_calls_result(tool_calls_result)
```
This keeps all current functionality (caching, context appending, logging) but makes the control flow and types significantly clearer and easier to extend.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if not components: | ||
| return f"error: No valid images to send. Errors: {'; '.join(errors)}" |
There was a problem hiding this comment.
suggestion (bug_risk): 明确仅提供 caption 但所有 image_refs 都无效时的行为。
由于 caption 会被加入到 components 中,即使所有 image_ref 都加载失败且没有任何图片被发送,这个工具仍然会报告成功。可以考虑:当存在任何 image_ref 且 sent_count == 0 时将其视作错误,或者调整返回消息,显式说明只发送了 caption 而没有发送任何图片,以免下游消费者和日志误解结果。
建议实现:
# If we have nothing at all to send (no caption, no images), bail out
if not components:
return f"error: No valid images to send. Errors: {'; '.join(errors)}"
# If a caption was provided but all image_refs failed to load, treat this as an error
# so downstream consumers don't misinterpret the result as images being delivered.
if sent_count == 0 and errors:
return (
"error: All provided image references failed to load; "
"no images were sent. "
f"Errors: {'; '.join(errors)}"
)
# Send the images这个改动基于以下假设:
sent_count在处理图片的循环之前已经初始化(例如sent_count = 0)。errors是一个用于收集 image ref 加载失败的列表。
如果 errors 里可能包含与图片无关的问题,你可能需要进一步收窄触发这个分支的条件,只在图片加载失败时进入(例如再维护一个单独的 image_ref_errors 列表,或者在循环中设置一个布尔标志 had_image_ref_failures)。
Original comment in English
suggestion (bug_risk): Clarify behavior when only a caption is provided but all image_refs are invalid.
Because the caption is added to components, the tool reports success even when all image_refs fail and no images are actually sent. Consider either treating sent_count == 0 as an error when any image_refs were provided, or adjusting the message to explicitly state that only the caption was sent and no images were delivered, so downstream consumers and logs don’t misinterpret the outcome.
Suggested implementation:
# If we have nothing at all to send (no caption, no images), bail out
if not components:
return f"error: No valid images to send. Errors: {'; '.join(errors)}"
# If a caption was provided but all image_refs failed to load, treat this as an error
# so downstream consumers don't misinterpret the result as images being delivered.
if sent_count == 0 and errors:
return (
"error: All provided image references failed to load; "
"no images were sent. "
f"Errors: {'; '.join(errors)}"
)
# Send the imagesThis change assumes:
sent_countis initialized before the image-processing loop (e.g.sent_count = 0).errorsis a list collecting any failures for image refs.
If errors can contain non-image-related issues, you may want to further narrow the condition to only treat image-loading failures as triggering this block (for example by tracking a separate image_ref_errors list or a boolean flag like had_image_ref_failures inside the loop).
…into image-tool
|
Documentation Updates 1 document(s) were updated by changes in this PR: pr4697的改动View Changes@@ -58,6 +58,23 @@
---
+## 3. send_message_to_user 工具
+#### 工具图片缓存与 LLM 预览流程
+
+主动代理系统现支持工具图片缓存机制。当工具返回图片时,这些图片会被缓存到本地(`data/temp/tool_images/`),而不是直接发送给用户。LLM 可以在后续对话中预览这些图片,并自主决定是否通过 `send_tool_image` 工具发送给用户。
+
+- 图片缓存说明:每次工具调用返回的图片会以唯一 ID 和索引命名,支持多图片缓存,缓存有效期为 1 小时。
+- LLM 预览:多模态模型会在上下文中收到图片的 base64 数据,可在回复中描述、分析图片内容。
+- 发送流程:LLM 若决定将图片发送给用户,可调用 `send_tool_image` 工具,指定要发送的图片路径(支持多张)及说明文字。发送后,缓存图片会自动清理。
+
+##### send_tool_image 工具用法
+- 参数:
+ - `paths`:要发送的图片路径列表(可通过上下文获得缓存图片路径)
+ - `caption`:可选,发送给用户的说明文字
+- 典型场景:LLM 先分析图片内容,确认无误后再主动推送给用户,提升安全性和交互灵活性。
+
+> 注意:send_tool_image 工具仅在启用函数工具时可用,且图片缓存会自动过期清理。
+
### 4. 工具注册与配置加载
#### 逻辑改进 |
Motivation / 动机
先前的llmtool仅支持直接将工具结果中的图片发送给用户,不支持让llm先看,十分不方便
Modifications / 改动点
新增
astrbot/core/agent/tool_image_cache.py(243 lines)ToolImageCachedata/temp/tool_images/{tool_call_id}_{index}.{ext}修改
astrbot/core/agent/runners/tool_loop_agent_runner.py_handle_function_tools(): 将ImageContent和BlobResourceContents图片缓存,不再直接发送step(): 检测modalities配置,为多模态模型注入 base64 图片到 user 消息AsyncGenerator[... | tuple[str, T.Any], None]以支持缓存元数据传递修改
astrbot/core/astr_main_agent_resources.pySendToolImageTool类(87 lines)修改
astrbot/core/astr_main_agent.pySEND_TOOL_IMAGE_TOOL到主 AgentScreenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Summary by Sourcery
引入工具图片缓存流程,使 LLM 能在选择性发送给用户之前先审查工具生成的图片。
新功能:
send_tool_image函数工具,使 LLM 能从缓存中选择要发送给用户的图片,并可选地添加说明文字。改进:
send_tool_image工具,以便在启用函数工具时均可使用。Original summary in English
Summary by Sourcery
Introduce a tool image caching flow so LLMs can review tool-generated images before selectively sending them to users.
New Features:
Enhancements: