feat(browser): add AgentCore Gateway + BrowserCustom for PR screenshots#37
feat(browser): add AgentCore Gateway + BrowserCustom for PR screenshots#37MichaelWalker-git wants to merge 9 commits intomainfrom
Conversation
|
Thanks @MichaelWalker-git ! QQ:
|
…PR screenshots Add a new AgentBrowser CDK construct that wires AgentCore BrowserCustom and Gateway L2 constructs so agents can capture screenshots of PRs and attach them as visual proof. The browser-tool Lambda handler starts a headless browser session via CDP over WebSocket, navigates to the PR page, captures a screenshot, uploads to S3, and returns a presigned URL. Two access paths: - Internal: agent post-hooks invoke the Lambda directly via boto3 - External: Gateway exposes the same Lambda as an MCP tool endpoint with Cognito client-credentials auth for future external agents The feature is fail-open throughout — screenshot failure never blocks PR creation or task completion. Feature-gated by the BROWSER_TOOL_FUNCTION_NAME env var.
The test polls /ping until 503 (set when _background_pipeline_failed becomes True) then immediately asserts write_terminal was called. But write_terminal runs after the flag is set in the same except block, creating a race. Fix by joining background threads before asserting.
305c3dc to
7d1c947
Compare
|
Thanks @krokoko! Failing build: Fixed — race condition in Deployment: Tested successfully in dev account (098305555551, us-east-1). All new resources created cleanly — BrowserCustom, Gateway, S3 screenshot bucket, Lambda, Cognito User Pool, and GatewayTarget. Stack outputs confirm BrowserId, GatewayUrl, and TokenEndpoint are provisioned. |
|
Review: I think I misunderstand the flow, is it:
Also in agent browser, the client bypasses the gateway entirely. It calls lambda directly via boto3. Since this addition doesn't use the gateway at all, I would remove it from this PR and we can add it later when we expose a tool via MCP (this creates now a second cognito user pool, ...) Other automated issues (to see if relevant based on above)
An attacker controlling task input (or any authenticated Gateway caller) could navigate the browser to:
Combined with the presigned URL being auto-published to the PR body, this creates a full exfiltration chain where sensitive data is automatically made public on GitHub. Required fix: URL allowlist in the Lambda handler itself (not just the Python caller). At minimum: HTTPS-only, reject RFC1918/link-local/metadata IPs, and ideally restrict to github.com since that's the stated purpose.
} catch { If StopBrowserSessionCommand fails, browser sessions leak silently. Over time: cost escalation, service quota exhaustion, and zero observability
|
…simplification - Remove Gateway from AgentBrowser construct (unused; agent calls Lambda directly via boto3, avoids extra Cognito user pool) - Add URL allowlist in browser-tool Lambda: HTTPS-only, github.com only, reject RFC1918/link-local/IMDS to prevent SSRF - Use discriminated union type for BrowserToolResponse - Sanitize taskId in S3 key path to [a-zA-Z0-9_-] - Add persistent WebSocket error handler, wrap JSON.parse in try/catch, replace non-null assertions with descriptive error checks - Log session cleanup failures instead of silently swallowing - Increase presigned URL expiry to 7 days (bounded by credential lifetime) - Distinguish config errors (ERROR) from transient errors (WARN) in browser.py; check FunctionError on Lambda invoke response - Check gh pr edit return code; deduplicate ## Screenshots section on retry - Log exception details in pipeline.py screenshot catch block - Add 9 tests for capture_pr_screenshots and _append_screenshots_to_pr
82b0af6 to
7395436
Compare
|
Thanks for the thorough review @krokoko! All issues addressed in the latest push: Architecture (Gateway removal): Agreed — removed the Gateway entirely. The agent calls Lambda directly via boto3; no MCP overhead or extra Cognito pool. The browser-during-development vision (VPC mode, dev server screenshots) makes sense for a future iteration with ECS/EC2 compute. This PR stays scoped to PR screenshot capture as a simpler starting point. Issues fixed:
Full build passes (746 CDK tests, 293 agent tests, ESLint, synth). |
Summary
AgentBrowserCDK construct wiring AgentCore BrowserCustom and Gateway L2 constructs for headless browser screenshotsbrowser-toolLambda handler: starts CDP session over WebSocket, navigates to PR page, captures screenshot, uploads to S3, returns presigned URLbrowser.pymodule invokes the Lambda directly via boto3 (fail-open, feature-gated byBROWSER_TOOL_FUNCTION_NAME)ensure_pr()and appends## Screenshotssection to PR bodyArchitecture
New files
cdk/src/handlers/browser-tool.tscdk/src/constructs/agent-browser.tsagent/src/browser.pycapture_screenshot()fail-opencdk/test/constructs/agent-browser.test.tscdk/test/handlers/browser-tool.test.tsagent/tests/test_browser.pyModified files
cdk/src/stacks/agent.tsagent/src/models.pyscreenshot_urlsfield toTaskResultagent/src/post_hooks.pycapture_pr_screenshots(),_append_screenshots_to_pr()agent/src/pipeline.pyensure_pr()cdk/package.jsonws,@types/ws,@aws-sdk/client-s3,@aws-sdk/s3-request-presignercdk/test/stacks/agent.test.tsDesign decisions
clientCredentialsflow (M2M), separate from TaskApi's Cognitowspackage bundled by esbuild into the LambdaBROWSER_TOOL_FUNCTION_NAMEenv var is setTest plan
mise //cdk:compile— TypeScript compiles with zero errorsmise //cdk:test— All 680 CDK tests pass (43 suites), including 13 new testspython -m pytest agent/tests/test_browser.py— All 4 Python tests passnew_task, verify PR has## Screenshotssection with working image link