Skip to content

Conversation

@ochafik
Copy link
Collaborator

@ochafik ochafik commented Dec 9, 2025

Summary

  • Add Playwright E2E tests with screenshot golden testing for all MCP App servers
  • Add interaction tests for basic servers (Send Message, Send Log, Open Link buttons)
  • Add wiki-explorer-server to E2E test suite
  • Enable parallel test execution (4 workers locally, 2 in CI) for faster runs (~28s)
  • Use .default() for tool schema defaults (wiki-explorer URL, threejs code/height) so basic-host can auto-populate input fields
  • Mask dynamic content (timestamps, charts, 3D canvas, force graphs) for stable screenshots

Changes

E2E Tests (tests/e2e/servers.spec.ts)

  • Screenshot tests for all 9 servers with dynamic content masking
  • Interaction tests verifying host callbacks (message, log, open link)
  • Uses nested iframe locators for MCP App structure

Playwright Config

  • Parallel execution with configurable workers
  • 30s per-test timeout
  • Platform-agnostic snapshots
  • HTML reporter locally, list reporter in CI

Tool Schema Improvements

  • wiki-explorer-server: Default URL https://en.wikipedia.org/wiki/Model_Context_Protocol
  • threejs-server: Default code (rotating cube) and height (400px)
  • basic-host: Auto-populates input field with tool defaults from JSON Schema

Test plan

  • npm test - Unit tests pass
  • npm run test:e2e - 26 E2E tests pass in ~28s
  • Screenshots match golden images with dynamic content masked

🤖 Generated with Claude Code

- Add E2E tests for all 8 MCP server examples
- Screenshot golden images for visual regression testing
- CI workflow for running E2E tests
- npm scripts: test:e2e, test:e2e:update, test:e2e:ui

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 9, 2025

Open in StackBlitz

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/ext-apps@115

commit: 53012ae

ochafik and others added 6 commits December 9, 2025 21:46
When no code is provided, show a rotating green cube demo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Set minimal read-only permissions for security best practices.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Playwright requires setTimeout to be called within a test or describe block.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove explicit setTimeout calls (default 30s is sufficient)
- Replace waitForTimeout(5000/6000) with proper waitForAppLoad()
- Wait for inner iframe visibility instead of fixed delays
- Keep only 500ms stabilization for screenshot animations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik ochafik changed the title Add Playwright E2E tests with screenshot golden testing tests: add Playwright E2E tests with screenshot golden testing Dec 9, 2025
@ochafik ochafik marked this pull request as ready for review December 9, 2025 21:23
Use updateSnapshots: 'missing' in CI to handle cross-platform
screenshot differences (macOS vs Linux).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik ochafik marked this pull request as draft December 9, 2025 21:27
ochafik and others added 5 commits December 9, 2025 22:29
- Remove -chromium-darwin suffix from snapshot filenames
- Configure snapshotPathTemplate for cross-platform compatibility
- Increase tolerance to 5% for rendering differences between platforms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
May help with Playwright test discovery issues in CI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Use npm ci for exact package versions
- Use --reporter=list instead of html to avoid potential re-evaluation issues
- Only upload test-results on failure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Run bun test only on src/ directory to avoid running Playwright tests
with Bun's test runner.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik ochafik marked this pull request as ready for review December 9, 2025 21:49
@ochafik ochafik requested a review from idosal December 10, 2025 01:32
Test that clicking buttons in the MCP App triggers the corresponding
host callbacks:
- Send Message → host logs "Message from MCP App"
- Send Log → host logs "Log message from MCP App"
- Open Link → host logs "Open link request"

Tests both React and Vanilla JS implementations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jonathanhefner
Copy link
Member

I will take a closer look tomorrow (sorry!), but how does this handle examples that have dynamic output? For example, the basic-server-react and basic-server-vanillajs examples output the server time, and the system-monitor-server example streams non-deterministic updates.

Add Playwright mask option to handle servers with dynamic/random content:
- basic-react/vanillajs: mask server time display
- system-monitor: mask CPU chart, memory stats, uptime
- cohort-heatmap: mask heatmap grid (random data)
- customer-segmentation: mask scatter chart (random data)

This addresses PR feedback about handling examples with non-deterministic
output. Masking replaces the previous 10% tolerance with proper exclusion
of dynamic elements, allowing tighter 1% tolerance for the rest of the UI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik
Copy link
Collaborator Author

ochafik commented Dec 10, 2025

@jonathanhefner it was using 10% pixel tolerance, i've now tigthened it to 1% by using per-test masks for variable stuff (they show up in purple)

Copy link
Member

@jonathanhefner jonathanhefner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool idea! 😎

CONTRIBUTING.md Outdated
npm run test:e2e:update -- --grep "Three.js"
```

**Note**: Golden screenshots are platform-specific (e.g., `*-chromium-darwin.png` for macOS). CI runs on Linux, so you may need to update screenshots in both environments or run in a container.
Copy link
Member

@jonathanhefner jonathanhefner Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having platform-specific golden screenshots, what do you think about comparing screenshots of the current commit with the last known good commit? That way, both screenshots would be using the same OS, browser, installed fonts, etc.

It might also allow dropping the masks in some cases, e.g., "Hostname" and "Platform" for the system-monitor example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry already made them platform agnostic will update the text

fullyParallel: false, // Run tests sequentially to share server
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: 1, // Single worker since we share the server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means all E2E tests are run sequentially (not in parallel), right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes. no real point to avoid (host) server interaction, and tests are independent (different mcp servers). Not sure if they'd use distinct headless browsers, will experiment.

Comment on lines 22 to 39
const SERVERS = [
{ key: "basic-react", index: 0, name: "Basic MCP App Server (React-based)" },
{
key: "basic-vanillajs",
index: 1,
name: "Basic MCP App Server (Vanilla JS)",
},
{ key: "budget-allocator", index: 2, name: "Budget Allocator Server" },
{ key: "cohort-heatmap", index: 3, name: "Cohort Heatmap Server" },
{
key: "customer-segmentation",
index: 4,
name: "Customer Segmentation Server",
},
{ key: "scenario-modeler", index: 5, name: "SaaS Scenario Modeler" },
{ key: "system-monitor", index: 6, name: "System Monitor Server" },
{ key: "threejs", index: 7, name: "Three.js Server" },
];
Copy link
Member

@jonathanhefner jonathanhefner Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly for a v2, what do you think about having each example define its own test/e2e/ directory? That might make it easier for each example to define its own testing parameters. For example, I could add a SEED env variable just for the customer-segmentation example, which would sidestep the issue of random data.

Also, possibly for a v2 or even v3, if we factor out test helpers for each example to use in its own test/e2e/ directory, we could possibly offer those test helpers as part of the Apps SDK so that other developers can test their own apps more easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this scale I'd rather keep things simple and centralized and one-fits-all (e.g. define same SEED for all - or just get the cue from CI env that the seed needs to be predictible - and have some unified way to disable animations - as in the wiki stuff - etc)
Was hoping the examples don't get complex enough that any of them requires specialized handling. For instance I've added defaults to relevant args schemas (wiki & three.js)

ochafik and others added 3 commits December 11, 2025 14:09
- Add default URL "https://en.wikipedia.org/wiki/Model_Context_Protocol" to
  wiki-explorer-server's get-first-degree-links tool inputSchema
- Update basic-host to automatically populate input field with tool defaults
  extracted from inputSchema.properties
- Add wiki-explorer-server to E2E test suite with dynamic masking for the
  force-directed graph

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This exposes the default code and height values in the JSON Schema,
allowing basic-host to auto-populate the input field with defaults.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
ochafik and others added 2 commits December 11, 2025 15:14
- Enable fullyParallel with 4 workers locally (2 in CI)
- Add 30s timeout per test
- Mask threejs canvas for stable screenshots
- Update snapshots to reflect default values in input fields

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Same check as in CI to catch issues before push.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
ochafik and others added 4 commits December 11, 2025 15:30
Font rendering differs between macOS and Linux, causing ~5% pixel differences.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants