Multimodal ToolCallSummaryMessage & FunctionExecutionResult Support
#6381
Replies: 3 comments 2 replies
-
|
You may want to take a look at the new |
Beta Was this translation helpful? Give feedback.
-
|
It seems that function calling results with images are not supported by the current OpenAI models. I will close this discussion. |
Beta Was this translation helpful? Give feedback.
-
|
I'm leaving this comment because I think there might be others who are struggling like I was. For the Autogen framework's agent to properly process the multi-mode output of the MCP tool implemented with FastMCP, you need to use MCPWorkbench. MCPWorkbench handles multi-mode output properly without assuming the tool's output is text. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It seems that the return value of function calling only supports the
strtype currently. But for VLMs, supporting multimodal function calling is necessary and important. A simple example is that it allows the model to automatically select images from the file system and analyze them. Thecontentof theToolCallSummaryMessageandFunctionExecutionResultshould bestr | agent_core.Image | list[str | agent_core.Image].Beta Was this translation helpful? Give feedback.
All reactions