Skip to content

feat: 允许 LLM 预览工具返回的图片并自主决定是否发送#4895

Merged
Soulter merged 6 commits intoAstrBotDevs:masterfrom
advent259141:image-tool
Feb 8, 2026
Merged

feat: 允许 LLM 预览工具返回的图片并自主决定是否发送#4895
Soulter merged 6 commits intoAstrBotDevs:masterfrom
advent259141:image-tool

Conversation

@advent259141
Copy link
Member

@advent259141 advent259141 commented Feb 5, 2026

Motivation / 动机

先前的llmtool仅支持直接将工具结果中的图片发送给用户,不支持让llm先看,十分不方便

Modifications / 改动点

  1. 新增 astrbot/core/agent/tool_image_cache.py (243 lines)

    • 实现单例图片缓存管理器 ToolImageCache
    • 支持保存、获取、删除、自动清理过期图片
    • 缓存路径:data/temp/tool_images/{tool_call_id}_{index}.{ext}
    • 过期时间:3600 秒(1 小时)
  2. 修改 astrbot/core/agent/runners/tool_loop_agent_runner.py

    • _handle_function_tools(): 将 ImageContentBlobResourceContents 图片缓存,不再直接发送
    • step(): 检测 modalities 配置,为多模态模型注入 base64 图片到 user 消息
    • 返回类型更新:AsyncGenerator[... | tuple[str, T.Any], None] 以支持缓存元数据传递
  3. 修改 astrbot/core/astr_main_agent_resources.py

    • 新增 SendToolImageTool 类(87 lines)
    • 功能:允许 LLM 选择性发送缓存的图片(支持多图片 + 说明文字)
    • 自动删除已发送的缓存图片
  4. 修改 astrbot/core/astr_main_agent.py

    • 导入并注册 SEND_TOOL_IMAGE_TOOL 到主 Agent

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

引入工具图片缓存流程,使 LLM 能在选择性发送给用户之前先审查工具生成的图片。

新功能:

  • 添加一个单例工具图片缓存,用于将工具生成的图片持久化到磁盘,并支持查找和过期机制。
  • 引入一个 send_tool_image 函数工具,使 LLM 能从缓存中选择要发送给用户的图片,并可选地添加说明文字。

改进:

  • 更新工具循环运行器,使其缓存工具的图片输出,将其作为图片内容重新暴露给多模态模型,同时避免直接将图片发送给用户。
  • 在主代理中注册新的 send_tool_image 工具,以便在启用函数工具时均可使用。
Original summary in English

Summary by Sourcery

Introduce a tool image caching flow so LLMs can review tool-generated images before selectively sending them to users.

New Features:

  • Add a singleton tool image cache to persist tool-generated images on disk with lookup and expiry support.
  • Introduce a send_tool_image function tool that lets LLMs choose which cached images to send to users with optional captions.

Enhancements:

  • Update the tool loop runner to cache image outputs from tools, expose them back to multimodal models as image content, and avoid sending them directly to users.
  • Register the new send_tool_image tool in the main agent so it is available whenever function tools are enabled.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 5, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 4 个问题,并给出了一些整体性的反馈:

  • ToolImageCache.get_image 中,回退时扫描的扩展名只覆盖了 .png/.jpg/.gif/.webp/.bmp,并且通过 image/{ext[1:]} 来重建 MIME 类型,这和 _get_file_extension 的行为不一致(例如不支持 .jpeg.svg,而 .jpg 会变成 image/jpg 而不是 image/jpeg)。建议在保存和读取两个路径上复用同一个扩展名/MIME 映射,以避免不一致。
  • step / _handle_function_tools 流程在异步生成器中使用了一个魔法元组形式 ("cached_image", cached_img) 来在各处传递图片元数据;可以考虑引入一个小的专用类型(例如 dataclass 或 TypedDict),替代这种带标签的元组,从而让这个扩展点更健壮、更具自说明性。
给 AI Agent 的提示词
请根据这次代码评审中的评论进行修改:

## 整体评论
-`ToolImageCache.get_image` 中,回退时扫描的扩展名只覆盖了 `.png/.jpg/.gif/.webp/.bmp`,并且通过 `image/{ext[1:]}` 来重建 MIME 类型,这和 `_get_file_extension` 的行为不一致(例如不支持 `.jpeg``.svg`,而 `.jpg` 会变成 `image/jpg` 而不是 `image/jpeg`)。建议在保存和读取两个路径上复用同一个扩展名/MIME 映射,以避免不一致。
- `step` / `_handle_function_tools` 流程在异步生成器中使用了一个魔法元组形式 `("cached_image", cached_img)` 来在各处传递图片元数据;可以考虑引入一个小的专用类型(例如 dataclass 或 TypedDict),替代这种带标签的元组,从而让这个扩展点更健壮、更具自说明性。

## 单独评论

### 评论 1
<location> `astrbot/core/agent/tool_image_cache.py:132` </location>
<code_context>
+            return cached
+
+        # Try to find the file directly if not in memory cache
+        for ext in [".png", ".jpg", ".gif", ".webp", ".bmp"]:
+            file_path = os.path.join(self._cache_dir, f"{image_ref}{ext}")
+            if os.path.exists(file_path):
</code_context>

<issue_to_address>
**issue (bug_risk):** 在探测缓存文件时加入 `.jpeg``.svg`,以避免重启后查找失败。

这里的回退搜索只检查 `['.png', '.jpg', '.gif', '.webp', '.bmp']`,但 `_get_file_extension` 也可能返回 `.jpeg``.svg`。在重启后,这些文件将无法在磁盘上被找到,也就无法被服务。请将 `.jpeg``.svg` 包含进这个列表(或者根据 `_get_file_extension` 的映射来派生该列表),以确保所有已保存的图片都可以恢复。
</issue_to_address>

### 评论 2
<location> `astrbot/core/astr_main_agent_resources.py:436-437` </location>
<code_context>
+            except Exception as e:
+                errors.append(f"Failed to load image '{image_ref}': {e}")
+
+        if not components:
+            return f"error: No valid images to send. Errors: {'; '.join(errors)}"
+
+        # Send the images
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 明确仅提供 caption 但所有 image_ref 都无效时的行为。

由于 caption 会被加入到 `components` 中,即使所有 `image_ref` 都加载失败且没有任何图片被发送,这个工具仍然会报告成功。可以考虑:当存在任何 `image_ref``sent_count == 0` 时将其视作错误,或者调整返回消息,显式说明只发送了 caption 而没有发送任何图片,以免下游消费者和日志误解结果。

建议实现:

```python
        # If we have nothing at all to send (no caption, no images), bail out
        if not components:
            return f"error: No valid images to send. Errors: {'; '.join(errors)}"

        # If a caption was provided but all image_refs failed to load, treat this as an error
        # so downstream consumers don't misinterpret the result as images being delivered.
        if sent_count == 0 and errors:
            return (
                "error: All provided image references failed to load; "
                "no images were sent. "
                f"Errors: {'; '.join(errors)}"
            )

        # Send the images

```

这个改动基于以下假设:
1. `sent_count` 在处理图片的循环之前已经初始化(例如 `sent_count = 0`)。
2. `errors` 是一个用于收集 image ref 加载失败的列表。

如果 `errors` 里可能包含与图片无关的问题,你可能需要进一步收窄触发这个分支的条件,只在图片加载失败时进入(例如再维护一个单独的 `image_ref_errors` 列表,或者在循环中设置一个布尔标志 `had_image_ref_failures`)。
</issue_to_address>

### 评论 3
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:287-290` </location>
<code_context>

             tool_call_result_blocks = []
+            cached_images = []  # Collect cached images for LLM visibility
             async for result in self._handle_function_tools(self.req, llm_resp):
                 if isinstance(result, list):
                     tool_call_result_blocks = result
+                elif isinstance(result, tuple) and result[0] == "cached_image":
+                    # Collect cached image info
+                    cached_images.append(result[1])
</code_context>

<issue_to_address>
**suggestion:** 直接使用原始元组作为缓存图片事件的判别字段会让生成器协议变得脆弱。

`_handle_function_tools` 现在会产出三种不同形状的 payload,并且针对缓存图片使用了一个带 magic string 的元组判别:`("cached_image", cached_img)`。如果今后再增加更多元组形态的事件,这种方式会很脆弱。建议引入一个小的封装类型(例如 `CachedImageEvent`),或者至少使用带类型的 `NamedTuple` / `Enum` 作为判别字段,这样 `step()` 中的模式匹配会更安全,也更利于类型检查工具。

建议实现:

```python
from typing import Any, NamedTuple

from astrbot.core.agent.message import ImageURLPart, TextPart, ThinkPart
from astrbot.core.agent.tool import ToolSet
from astrbot.core.agent.tool_image_cache import tool_image_cache
from astrbot.core.message.components import Json
from astrbot.core.message.message_event_result import (
    MessageChain,

```

```python
                llm_resp, _ = await self._resolve_tool_exec(llm_resp)

            class CachedImageEvent(NamedTuple):
                """Wrapper type for cached image events yielded by _handle_function_tools."""
                image: Any

            tool_call_result_blocks = []
            cached_images = []  # Collect cached images for LLM visibility
            async for result in self._handle_function_tools(self.req, llm_resp):
                if isinstance(result, list):
                    tool_call_result_blocks = result
                elif isinstance(result, CachedImageEvent):
                    # Collect cached image info
                    cached_images.append(result.image)
                elif isinstance(result, MessageChain):
                    if result.type is None:
                        # should not happen

```

要完整实现这个重构,你还需要:

1. 更新 `_handle_function_tools`(在本文件或其定义所在文件内),让它产出 `CachedImageEvent(image=cached_img)`,而不是当前的 `("cached_image", cached_img)` 元组。
2. 如果 `_handle_function_tools` 在其他地方也被使用,需要同步调整所有目前依赖 `("cached_image", ...)` 元组的消费者,让它们改为检查 `CachedImageEvent`,并使用 `.image` 属性。
3. 如果你打算在多个方法或模块中复用这个类型,可以考虑将 `CachedImageEvent` 提升到模块级(方法外部),即移除内部类定义,在顶层定义 `class CachedImageEvent(NamedTuple): image: Any`</issue_to_address>

### 评论 4
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:359` </location>
<code_context>
         self,
         req: ProviderRequest,
         llm_response: LLMResponse,
-    ) -> T.AsyncGenerator[MessageChain | list[ToolCallMessageSegment], None]:
+    ) -> T.AsyncGenerator[
+        MessageChain | list[ToolCallMessageSegment] | tuple[str, T.Any], None
</code_context>

<issue_to_address>
**issue (complexity):** 可以考虑引入一个带类型的 ToolLoopEvent,并将追加缓存图片的逻辑提取到一个辅助方法中,在不改变行为的前提下简化控制流并提升类型清晰度。

你可以通过引入一个带类型的事件封装,并抽取追加图片的逻辑,在不改变行为的情况下降低新增复杂度。

### 1. 用带类型的事件替代非同构的生成器输出

与其返回 `MessageChain | list[ToolCallMessageSegment] | tuple[str, Any]`,不如定义一个小的事件类型:

```python
from dataclasses import dataclass
from typing import Literal

@dataclass
class ToolLoopEvent:
    kind: Literal["tool_results", "message", "cached_image"]
    payload: T.Any  # 或者如果你愿意,可以使用更精确的联合类型
```

然后更新 `_handle_function_tools`,改为产出 `ToolLoopEvent````python
async def _handle_function_tools(
    self,
    req: ProviderRequest,
    llm_response: LLMResponse,
) -> T.AsyncGenerator[ToolLoopEvent, None]:
    ...

    # 以前是:`tool_call_result_blocks = result`
    yield ToolLoopEvent(kind="tool_results", payload=tool_call_result_blocks)

    # 以前是:`yield MessageChain(...)`
    yield ToolLoopEvent(kind="message", payload=MessageChain(...))

    # 以前是:`yield ("cached_image", cached_img)`
    yield ToolLoopEvent(kind="cached_image", payload=cached_img)
```

这样 `step` 就变成了一个简单、带类型的分发逻辑:

```python
tool_call_result_blocks = []
cached_images = []

async for event in self._handle_function_tools(self.req, llm_resp):
    if event.kind == "tool_results":
        tool_call_result_blocks = event.payload
    elif event.kind == "cached_image":
        cached_images.append(event.payload)
    elif event.kind == "message":
        result = event.payload
        if result.type is None:
            continue
        ar_type = "tool_call_result" if result.type == "tool_direct_result" else result.type
        yield AgentResponse(
            type=ar_type,
            result=MessageEventResult(
                type=ar_type,
                inputs=self.req.to_message_event_inputs(),
                chain=result,
            ),
        )
```

这既移除了元组中的 magic 字符串 `"cached_image"`,又保留了所有现有行为。

### 2. 从 `step` 中提取追加缓存图片的逻辑

目前检查 `modalities`、获取 base64,并追加用户消息的代码块可以抽到一个辅助方法中,使 `step` 本身更专注:

```python
def _append_cached_images_to_context(self, cached_images: list[CachedImage]) -> None:
    if not cached_images:
        return

    modalities = self.provider.provider_config.get("modalities", [])
    if "image" not in modalities:
        return

    image_parts = []
    for cached_img in cached_images:
        img_data = tool_image_cache.get_image_base64(cached_img.image_ref)
        if not img_data:
            continue
        base64_data, mime_type = img_data

        image_parts.append(
            TextPart(
                text=(
                    f"[Image from tool '{cached_img.tool_name}', "
                    f"ref='{cached_img.image_ref}']"
                )
            )
        )
        image_parts.append(
            ImageURLPart(
                image_url=ImageURLPart.ImageURL(
                    url=f"data:{mime_type};base64,{base64_data}",
                    id=cached_img.image_ref,
                )
            )
        )

    if image_parts:
        self.run_context.messages.append(
            Message(role="user", content=image_parts)
        )
        logger.debug(
            f"Appended {len(cached_images)} cached image(s) to context for LLM review"
        )
```

这样 `step` 中只需要:

```python
# 在更新完 tool_calls_result 和 run_context.messages 之后
self._append_cached_images_to_context(cached_images)
self.req.append_tool_calls_result(tool_calls_result)
```

在保持全部现有功能(缓存、上下文追加、日志记录)的同时,让控制流和类型更加清晰、易于扩展。
</issue_to_address>

Sourcery 对开源项目免费 —— 如果你觉得我们的评审有帮助,欢迎分享 ✨
请帮我变得更有用!欢迎对每条评论点 👍 或 👎,我会根据反馈改进评审质量。
Original comment in English

Hey - I've found 4 issues, and left some high level feedback:

  • In ToolImageCache.get_image, the fallback extension scan only covers .png/.jpg/.gif/.webp/.bmp and reconstructs MIME types as image/{ext[1:]}, which is inconsistent with _get_file_extension (e.g., .jpeg and .svg are unsupported and .jpg becomes image/jpg instead of image/jpeg); consider reusing a single extension/MIME map for both save and load paths to avoid mismatches.
  • The step/_handle_function_tools flow uses a magic tuple form ("cached_image", cached_img) in the async generator to pass image metadata around; introducing a small dedicated type (e.g., a dataclass or TypedDict) instead of a tagged tuple would make this extension point more robust and self-documenting.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ToolImageCache.get_image`, the fallback extension scan only covers `.png/.jpg/.gif/.webp/.bmp` and reconstructs MIME types as `image/{ext[1:]}`, which is inconsistent with `_get_file_extension` (e.g., `.jpeg` and `.svg` are unsupported and `.jpg` becomes `image/jpg` instead of `image/jpeg`); consider reusing a single extension/MIME map for both save and load paths to avoid mismatches.
- The `step`/`_handle_function_tools` flow uses a magic tuple form `("cached_image", cached_img)` in the async generator to pass image metadata around; introducing a small dedicated type (e.g., a dataclass or TypedDict) instead of a tagged tuple would make this extension point more robust and self-documenting.

## Individual Comments

### Comment 1
<location> `astrbot/core/agent/tool_image_cache.py:132` </location>
<code_context>
+            return cached
+
+        # Try to find the file directly if not in memory cache
+        for ext in [".png", ".jpg", ".gif", ".webp", ".bmp"]:
+            file_path = os.path.join(self._cache_dir, f"{image_ref}{ext}")
+            if os.path.exists(file_path):
</code_context>

<issue_to_address>
**issue (bug_risk):** Include `.jpeg` and `.svg` when probing for cached files to avoid lookup failures after restart.

The fallback search here only checks `['.png', '.jpg', '.gif', '.webp', '.bmp']`, but `_get_file_extension` can also produce `.jpeg` and `.svg`. After a restart, those files won’t be found on disk and thus won’t be served. Please include `.jpeg` and `.svg` in this list (or derive the list from `_get_file_extension`’s mapping) so all saved images are recoverable.
</issue_to_address>

### Comment 2
<location> `astrbot/core/astr_main_agent_resources.py:436-437` </location>
<code_context>
+            except Exception as e:
+                errors.append(f"Failed to load image '{image_ref}': {e}")
+
+        if not components:
+            return f"error: No valid images to send. Errors: {'; '.join(errors)}"
+
+        # Send the images
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Clarify behavior when only a caption is provided but all image_refs are invalid.

Because the caption is added to `components`, the tool reports success even when all `image_ref`s fail and no images are actually sent. Consider either treating `sent_count == 0` as an error when any `image_ref`s were provided, or adjusting the message to explicitly state that only the caption was sent and no images were delivered, so downstream consumers and logs don’t misinterpret the outcome.

Suggested implementation:

```python
        # If we have nothing at all to send (no caption, no images), bail out
        if not components:
            return f"error: No valid images to send. Errors: {'; '.join(errors)}"

        # If a caption was provided but all image_refs failed to load, treat this as an error
        # so downstream consumers don't misinterpret the result as images being delivered.
        if sent_count == 0 and errors:
            return (
                "error: All provided image references failed to load; "
                "no images were sent. "
                f"Errors: {'; '.join(errors)}"
            )

        # Send the images

```

This change assumes:
1. `sent_count` is initialized before the image-processing loop (e.g. `sent_count = 0`).
2. `errors` is a list collecting any failures for image refs.

If `errors` can contain non-image-related issues, you may want to further narrow the condition to only treat image-loading failures as triggering this block (for example by tracking a separate `image_ref_errors` list or a boolean flag like `had_image_ref_failures` inside the loop).
</issue_to_address>

### Comment 3
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:287-290` </location>
<code_context>

             tool_call_result_blocks = []
+            cached_images = []  # Collect cached images for LLM visibility
             async for result in self._handle_function_tools(self.req, llm_resp):
                 if isinstance(result, list):
                     tool_call_result_blocks = result
+                elif isinstance(result, tuple) and result[0] == "cached_image":
+                    # Collect cached image info
+                    cached_images.append(result[1])
</code_context>

<issue_to_address>
**suggestion:** Using a raw tuple discriminator for cached image events makes the generator protocol brittle.

`_handle_function_tools` is now yielding three payload shapes and uses a magic-string tuple discriminator for cached images: `("cached_image", cached_img)`. This will be fragile if more tuple-shaped events are added. Consider introducing a small wrapper type (e.g. `CachedImageEvent`), or at least a typed `NamedTuple`/`Enum` for the discriminator, so `step()` pattern matching is safer and better supported by type-checkers.

Suggested implementation:

```python
from typing import Any, NamedTuple

from astrbot.core.agent.message import ImageURLPart, TextPart, ThinkPart
from astrbot.core.agent.tool import ToolSet
from astrbot.core.agent.tool_image_cache import tool_image_cache
from astrbot.core.message.components import Json
from astrbot.core.message.message_event_result import (
    MessageChain,

```

```python
                llm_resp, _ = await self._resolve_tool_exec(llm_resp)

            class CachedImageEvent(NamedTuple):
                """Wrapper type for cached image events yielded by _handle_function_tools."""
                image: Any

            tool_call_result_blocks = []
            cached_images = []  # Collect cached images for LLM visibility
            async for result in self._handle_function_tools(self.req, llm_resp):
                if isinstance(result, list):
                    tool_call_result_blocks = result
                elif isinstance(result, CachedImageEvent):
                    # Collect cached image info
                    cached_images.append(result.image)
                elif isinstance(result, MessageChain):
                    if result.type is None:
                        # should not happen

```

To fully implement this refactor you should also:

1. Update `_handle_function_tools` (in this file or wherever it is defined) so that it yields `CachedImageEvent(image=cached_img)` instead of the current `("cached_image", cached_img)` tuple.
2. If `_handle_function_tools` is used elsewhere, adjust any other consumers that currently expect a `("cached_image", ...)` tuple to instead check for `CachedImageEvent` and use the `.image` attribute.
3. Consider moving `CachedImageEvent` to module scope (outside the method) if you want to reuse the type in multiple methods or modules; in that case, remove the nested class and define `class CachedImageEvent(NamedTuple): image: Any` at the top level.
</issue_to_address>

### Comment 4
<location> `astrbot/core/agent/runners/tool_loop_agent_runner.py:359` </location>
<code_context>
         self,
         req: ProviderRequest,
         llm_response: LLMResponse,
-    ) -> T.AsyncGenerator[MessageChain | list[ToolCallMessageSegment], None]:
+    ) -> T.AsyncGenerator[
+        MessageChain | list[ToolCallMessageSegment] | tuple[str, T.Any], None
</code_context>

<issue_to_address>
**issue (complexity):** Consider introducing a typed ToolLoopEvent and extracting the cached-image appending logic into a helper method to simplify control flow and improve type clarity without changing behavior.

You can reduce the new complexity without changing behavior by introducing a typed event wrapper and extracting the image-append logic.

### 1. Replace heterogeneous generator output with a typed event

Instead of yielding `MessageChain | list[ToolCallMessageSegment] | tuple[str, Any]`, define a small event type:

```python
from dataclasses import dataclass
from typing import Literal

@dataclass
class ToolLoopEvent:
    kind: Literal["tool_results", "message", "cached_image"]
    payload: T.Any  # or a more precise union if you prefer
```

Update `_handle_function_tools` to yield `ToolLoopEvent`:

```python
async def _handle_function_tools(
    self,
    req: ProviderRequest,
    llm_response: LLMResponse,
) -> T.AsyncGenerator[ToolLoopEvent, None]:
    ...

    # when you previously had: `tool_call_result_blocks = result`
    yield ToolLoopEvent(kind="tool_results", payload=tool_call_result_blocks)

    # when you previously had: `yield MessageChain(...)`
    yield ToolLoopEvent(kind="message", payload=MessageChain(...))

    # when you previously had: `yield ("cached_image", cached_img)`
    yield ToolLoopEvent(kind="cached_image", payload=cached_img)
```

Then `step` becomes a simple, typed dispatch:

```python
tool_call_result_blocks = []
cached_images = []

async for event in self._handle_function_tools(self.req, llm_resp):
    if event.kind == "tool_results":
        tool_call_result_blocks = event.payload
    elif event.kind == "cached_image":
        cached_images.append(event.payload)
    elif event.kind == "message":
        result = event.payload
        if result.type is None:
            continue
        ar_type = "tool_call_result" if result.type == "tool_direct_result" else result.type
        yield AgentResponse(
            type=ar_type,
            result=MessageEventResult(
                type=ar_type,
                inputs=self.req.to_message_event_inputs(),
                chain=result,
            ),
        )
```

This removes the magic `"cached_image"` tuple tag and preserves all existing behavior.

### 2. Extract cached-image appending logic from `step`

The block that inspects `modalities`, fetches base64, and appends a user message can be moved into a helper to keep `step` focused:

```python
def _append_cached_images_to_context(self, cached_images: list[CachedImage]) -> None:
    if not cached_images:
        return

    modalities = self.provider.provider_config.get("modalities", [])
    if "image" not in modalities:
        return

    image_parts = []
    for cached_img in cached_images:
        img_data = tool_image_cache.get_image_base64(cached_img.image_ref)
        if not img_data:
            continue
        base64_data, mime_type = img_data

        image_parts.append(
            TextPart(
                text=(
                    f"[Image from tool '{cached_img.tool_name}', "
                    f"ref='{cached_img.image_ref}']"
                )
            )
        )
        image_parts.append(
            ImageURLPart(
                image_url=ImageURLPart.ImageURL(
                    url=f"data:{mime_type};base64,{base64_data}",
                    id=cached_img.image_ref,
                )
            )
        )

    if image_parts:
        self.run_context.messages.append(
            Message(role="user", content=image_parts)
        )
        logger.debug(
            f"Appended {len(cached_images)} cached image(s) to context for LLM review"
        )
```

`step` then only needs:

```python
# after tool_calls_result and run_context.messages are updated
self._append_cached_images_to_context(cached_images)
self.req.append_tool_calls_result(tool_calls_result)
```

This keeps all current functionality (caching, context appending, logging) but makes the control flow and types significantly clearer and easier to extend.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 436 to 437
if not components:
return f"error: No valid images to send. Errors: {'; '.join(errors)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): 明确仅提供 caption 但所有 image_refs 都无效时的行为。

由于 caption 会被加入到 components 中,即使所有 image_ref 都加载失败且没有任何图片被发送,这个工具仍然会报告成功。可以考虑:当存在任何 image_refsent_count == 0 时将其视作错误,或者调整返回消息,显式说明只发送了 caption 而没有发送任何图片,以免下游消费者和日志误解结果。

建议实现:

        # If we have nothing at all to send (no caption, no images), bail out
        if not components:
            return f"error: No valid images to send. Errors: {'; '.join(errors)}"

        # If a caption was provided but all image_refs failed to load, treat this as an error
        # so downstream consumers don't misinterpret the result as images being delivered.
        if sent_count == 0 and errors:
            return (
                "error: All provided image references failed to load; "
                "no images were sent. "
                f"Errors: {'; '.join(errors)}"
            )

        # Send the images

这个改动基于以下假设:

  1. sent_count 在处理图片的循环之前已经初始化(例如 sent_count = 0)。
  2. errors 是一个用于收集 image ref 加载失败的列表。

如果 errors 里可能包含与图片无关的问题,你可能需要进一步收窄触发这个分支的条件,只在图片加载失败时进入(例如再维护一个单独的 image_ref_errors 列表,或者在循环中设置一个布尔标志 had_image_ref_failures)。

Original comment in English

suggestion (bug_risk): Clarify behavior when only a caption is provided but all image_refs are invalid.

Because the caption is added to components, the tool reports success even when all image_refs fail and no images are actually sent. Consider either treating sent_count == 0 as an error when any image_refs were provided, or adjusting the message to explicitly state that only the caption was sent and no images were delivered, so downstream consumers and logs don’t misinterpret the outcome.

Suggested implementation:

        # If we have nothing at all to send (no caption, no images), bail out
        if not components:
            return f"error: No valid images to send. Errors: {'; '.join(errors)}"

        # If a caption was provided but all image_refs failed to load, treat this as an error
        # so downstream consumers don't misinterpret the result as images being delivered.
        if sent_count == 0 and errors:
            return (
                "error: All provided image references failed to load; "
                "no images were sent. "
                f"Errors: {'; '.join(errors)}"
            )

        # Send the images

This change assumes:

  1. sent_count is initialized before the image-processing loop (e.g. sent_count = 0).
  2. errors is a list collecting any failures for image refs.

If errors can contain non-image-related issues, you may want to further narrow the condition to only treat image-loading failures as triggering this block (for example by tracking a separate image_ref_errors list or a boolean flag like had_image_ref_failures inside the loop).

@dosubot dosubot bot added the area:core The bug / feature is about astrbot's core, backend label Feb 5, 2026
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 8, 2026
@Soulter Soulter merged commit 952023d into AstrBotDevs:master Feb 8, 2026
6 checks passed
@dosubot
Copy link

dosubot bot commented Feb 8, 2026

Documentation Updates

1 document(s) were updated by changes in this PR:

pr4697的改动
View Changes
@@ -58,6 +58,23 @@
 
 ---
 
+## 3. send_message_to_user 工具
+#### 工具图片缓存与 LLM 预览流程
+
+主动代理系统现支持工具图片缓存机制。当工具返回图片时,这些图片会被缓存到本地(`data/temp/tool_images/`),而不是直接发送给用户。LLM 可以在后续对话中预览这些图片,并自主决定是否通过 `send_tool_image` 工具发送给用户。
+
+- 图片缓存说明:每次工具调用返回的图片会以唯一 ID 和索引命名,支持多图片缓存,缓存有效期为 1 小时。
+- LLM 预览:多模态模型会在上下文中收到图片的 base64 数据,可在回复中描述、分析图片内容。
+- 发送流程:LLM 若决定将图片发送给用户,可调用 `send_tool_image` 工具,指定要发送的图片路径(支持多张)及说明文字。发送后,缓存图片会自动清理。
+
+##### send_tool_image 工具用法
+- 参数:
+  - `paths`:要发送的图片路径列表(可通过上下文获得缓存图片路径)
+  - `caption`:可选,发送给用户的说明文字
+- 典型场景:LLM 先分析图片内容,确认无误后再主动推送给用户,提升安全性和交互灵活性。
+
+> 注意:send_tool_image 工具仅在启用函数工具时可用,且图片缓存会自动过期清理。
+
 ### 4. 工具注册与配置加载
 
 #### 逻辑改进

How did I do? Any feedback?  Join Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants