Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
70c1215
Add Playwright E2E tests with screenshot golden testing
ochafik Dec 9, 2025
b790f3d
Add default demo code for Three.js example
ochafik Dec 9, 2025
3469766
Add E2E testing documentation to CONTRIBUTING.md
ochafik Dec 9, 2025
ca62dfa
Update Three.js golden screenshot with 3D cube
ochafik Dec 9, 2025
585b4d2
Add explicit permissions to CI workflow
ochafik Dec 9, 2025
6db453a
Fix test.setTimeout() to be inside describe blocks
ochafik Dec 9, 2025
0623088
Simplify E2E tests - remove excessive timeouts
ochafik Dec 9, 2025
fd790a1
Auto-generate missing snapshots in CI
ochafik Dec 9, 2025
42665c8
Use platform-agnostic golden screenshots
ochafik Dec 9, 2025
0d05cf8
Refactor tests to use forEach instead of for-of loop
ochafik Dec 9, 2025
f62e4f4
Use npm ci and list reporter in CI
ochafik Dec 9, 2025
95aa92f
Exclude e2e tests from bun test
ochafik Dec 9, 2025
9aa7727
Fix prettier formatting
ochafik Dec 9, 2025
20ef214
Add interaction tests for basic server apps
ochafik Dec 10, 2025
6cd4d1a
Mask dynamic content in E2E screenshot tests
ochafik Dec 10, 2025
13dde97
Merge origin/main into ochafik/e2e-tests
ochafik Dec 11, 2025
4b73b5f
Add wiki-explorer to E2E tests with default URL param
ochafik Dec 11, 2025
d3ff6da
Use .default() for threejs tool schema instead of .optional()
ochafik Dec 11, 2025
e6e6e7a
Enable parallel E2E tests with timeouts and canvas masking
ochafik Dec 11, 2025
da89aa9
Add pre-commit check for private registry URLs in package-lock.json
ochafik Dec 11, 2025
6c44f8e
Increase screenshot diff tolerance to 6% for cross-platform rendering
ochafik Dec 11, 2025
277a004
Revert pre-commit artifactory check (moved to separate PR #133)
ochafik Dec 11, 2025
4a0c3bc
Format threejs server.ts
ochafik Dec 11, 2025
53012ae
Update CONTRIBUTING.md to reflect platform-agnostic screenshots
ochafik Dec 11, 2025
23d0e2e
Merge branch 'main' into ochafik/e2e-tests
ochafik Dec 13, 2025
0b890bd
fix(e2e): use factory pattern for MCP servers to support parallel con…
ochafik Dec 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,32 @@
- run: npm test

- run: npm run prettier

e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: oven-sh/setup-bun@v2
with:
bun-version: latest

- uses: actions/setup-node@v4
with:
node-version: "20"

- run: npm install

- name: Install Playwright browsers
run: npx playwright install --with-deps chromium

- name: Run E2E tests
run: npm run test:e2e

- name: Upload Playwright report
uses: actions/upload-artifact@v4
if: ${{ !cancelled() }}
with:
name: playwright-report
path: playwright-report/
retention-days: 7
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,8 @@ bun.lockb
.vscode/
docs/api/
tmp/
intermediate-findings/

# Playwright
playwright-report/
test-results/
64 changes: 64 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@
"prepack": "npm run build",
"build:all": "npm run build && npm run examples:build",
"test": "bun test",
"test:e2e": "playwright test",
"test:e2e:update": "playwright test --update-snapshots",
"test:e2e:ui": "playwright test --ui",
"examples:build": "bun examples/run-all.ts build",
"examples:start": "NODE_ENV=development npm run build && bun examples/run-all.ts start",
"examples:dev": "NODE_ENV=development bun examples/run-all.ts dev",
Expand All @@ -48,10 +51,11 @@
},
"author": "Olivier Chafik",
"devDependencies": {
"@playwright/test": "^1.52.0",
"@types/bun": "^1.3.2",
"bun": "^1.3.2",
"@types/react": "^19.2.2",
"@types/react-dom": "^19.2.2",
"bun": "^1.3.2",
"concurrently": "^9.2.1",
"cors": "^2.8.5",
"esbuild": "^0.25.12",
Expand All @@ -71,8 +75,8 @@
"optionalDependencies": {
"@rollup/rollup-darwin-arm64": "^4.53.3",
"@rollup/rollup-darwin-x64": "^4.53.3",
"@rollup/rollup-linux-x64-gnu": "^4.53.3",
"@rollup/rollup-linux-arm64-gnu": "^4.53.3",
"@rollup/rollup-linux-x64-gnu": "^4.53.3",
"@rollup/rollup-win32-x64-msvc": "^4.53.3"
}
}
43 changes: 43 additions & 0 deletions playwright.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
testDir: "./tests/e2e",
fullyParallel: false, // Run tests sequentially to share server
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: 1, // Single worker since we share the server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means all E2E tests are run sequentially (not in parallel), right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes. no real point to avoid (host) server interaction, and tests are independent (different mcp servers). Not sure if they'd use distinct headless browsers, will experiment.

reporter: "html",
use: {
baseURL: "http://localhost:8080",
trace: "on-first-retry",
screenshot: "only-on-failure",
},
projects: [
{
name: "chromium",
use: {
...devices["Desktop Chrome"],
launchOptions: {
// Use system Chrome on macOS for stability, default chromium in CI
...(process.platform === "darwin" ? { channel: "chrome" } : {}),
},
},
},
],
// Run examples server before tests
webServer: {
command: "npm run examples:start",
url: "http://localhost:8080",
reuseExistingServer: !process.env.CI,
timeout: 120000,
},
// Snapshot configuration
expect: {
toHaveScreenshot: {
// Allow 2% pixel difference for dynamic content (timestamps, etc.)
maxDiffPixelRatio: 0.02,
// Animation stabilization
animations: "disabled",
},
},
});
88 changes: 88 additions & 0 deletions tests/e2e/servers.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import { test, expect } from "@playwright/test";

// Server configurations
const SERVERS = [
{ key: "basic-react", index: 0, name: "Basic MCP App Server (React-based)" },
{
key: "basic-vanillajs",
index: 1,
name: "Basic MCP App Server (Vanilla JS)",
},
{ key: "budget-allocator", index: 2, name: "Budget Allocator Server" },
{ key: "cohort-heatmap", index: 3, name: "Cohort Heatmap Server" },
{
key: "customer-segmentation",
index: 4,
name: "Customer Segmentation Server",
},
{ key: "scenario-modeler", index: 5, name: "SaaS Scenario Modeler" },
{ key: "system-monitor", index: 6, name: "System Monitor Server" },
{ key: "threejs", index: 7, name: "Three.js Server" },
];
Comment on lines 24 to 42
Copy link
Member

@jonathanhefner jonathanhefner Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly for a v2, what do you think about having each example define its own test/e2e/ directory? That might make it easier for each example to define its own testing parameters. For example, I could add a SEED env variable just for the customer-segmentation example, which would sidestep the issue of random data.

Also, possibly for a v2 or even v3, if we factor out test helpers for each example to use in its own test/e2e/ directory, we could possibly offer those test helpers as part of the Apps SDK so that other developers can test their own apps more easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this scale I'd rather keep things simple and centralized and one-fits-all (e.g. define same SEED for all - or just get the cue from CI env that the seed needs to be predictible - and have some unified way to disable animations - as in the wiki stuff - etc)
Was hoping the examples don't get complex enough that any of them requires specialized handling. For instance I've added defaults to relevant args schemas (wiki & three.js)


// Increase timeout for iframe-heavy tests
test.setTimeout(90000);

test.describe("Host UI", () => {
test("initial state shows controls", async ({ page }) => {
await page.goto("/");
await expect(page.locator("label:has-text('Server')")).toBeVisible();
await expect(page.locator("label:has-text('Tool')")).toBeVisible();
await expect(page.locator('button:has-text("Call Tool")')).toBeVisible();
});

test("screenshot of initial state", async ({ page }) => {
await page.goto("/");
await page.waitForTimeout(1000);
await expect(page).toHaveScreenshot("host-initial.png");
});
});

// Generate tests for each server
for (const server of SERVERS) {
test.describe(`${server.name}`, () => {
test(`loads app UI`, async ({ page }) => {
await page.goto("/");

// Select server
const serverSelect = page.locator("select").first();
await serverSelect.selectOption({ index: server.index });

// Click Call Tool
await page.click('button:has-text("Call Tool")');

// Wait for outer iframe
await page.waitForSelector("iframe", { timeout: 10000 });

// Wait for content to load (generous timeout for nested iframes)
await page.waitForTimeout(5000);

// Verify iframe structure exists
const outerFrame = page.frameLocator("iframe").first();
await expect(outerFrame.locator("iframe")).toBeVisible({
timeout: 10000,
});
});

test(`screenshot matches golden`, async ({ page }) => {
await page.goto("/");

// Select server
const serverSelect = page.locator("select").first();
await serverSelect.selectOption({ index: server.index });

// Click Call Tool
await page.click('button:has-text("Call Tool")');

// Wait for app to fully load
await page.waitForSelector("iframe", { timeout: 10000 });
await page.waitForTimeout(6000); // Extra time for nested iframe content

// Take screenshot
await expect(page).toHaveScreenshot(`${server.key}.png`, {
maxDiffPixelRatio: 0.1, // 10% tolerance for dynamic content
timeout: 10000,
});
});
});
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading