Skip to content

Conversation

@idosal
Copy link
Collaborator

@idosal idosal commented Dec 10, 2025

The spec defines the ui/message's content as a single message. Passing an array would support additional use cases (e.g., prompting the agent with a text message and an image). This is already the implementation in the ext-apps SDK.

See #48

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 10, 2025

Open in StackBlitz

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/ext-apps@119

commit: ee557a4

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the ui/message specification with the existing implementation in the ext-apps SDK by changing the content parameter from a single object to an array of content blocks. This enables multi-modal messages such as combining text and images in a single message.

Key changes:

  • Updated ui/message content parameter to accept an array of content blocks instead of a single object
  • Maintains consistency with the existing TypeScript implementation which already uses ContentBlock[] from the MCP SDK
  • Enables the use case mentioned in issue #48 for multi-modal messages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +509 to +514
content: [
{
type: "text",
text: string
}
]
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification should document what content types are supported in the content array. While the PR description mentions supporting images (e.g., "prompting the agent with a text message and an image"), the spec only shows a single text content block.

Consider adding:

  1. A reference to the MCP SDK's ContentBlock type definition
  2. Documentation of supported content types (text, image, etc.)
  3. An example showing multiple content items in the array (e.g., text + image)

This would help implementers understand the full capabilities of this API and align with the stated goal of supporting additional use cases like multi-modal messages.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ochafik I see that the SDK supports ContentBlock as-is. There are types that are extremely cool and open up really interesting use cases, but I'm not sure hosts would know how to support and may require negotiation (e.g., audio, embedded resource). Should we limit it to text (and perhaps image if we add negotiation) for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idosal let's just make it all negotiable:

export interface McpUiHostCapabilities {
  /* @description Optional support for ui/message and its content types */
  messages?: {
    text?: {};
    image?: {};
    audio?: {};
    resource?: {};
    resourceLink?: {};
  }
  ...
}

cc/ @antonpk1 fyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants