grove-platform · cbullinger · Apr 18, 2026 · Apr 20, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -128,7 +128,7 @@ jobs:
             --region $REGION \
             --project $PROJECT_ID \
             --allow-unauthenticated \
-            --set-env-vars="^|^CONFIG_REPO_OWNER=grove-platform|CONFIG_REPO_NAME=github-copier|CONFIG_REPO_BRANCH=main|PEM_NAME=CODE_COPIER_PEM|WEBHOOK_SECRET_NAME=webhook-secret|MONGO_URI_SECRET_NAME=mongo-uri|WEBSERVER_PATH=/events|MAIN_CONFIG_FILE=.copier/main.yaml|USE_MAIN_CONFIG=true|DEPRECATION_FILE=deprecated_examples.json|COMMITTER_NAME=GitHub Copier App|COMMITTER_EMAIL=bot@mongodb.com|GOOGLE_CLOUD_PROJECT_ID=github-copy-code-examples|COPIER_LOG_NAME=code-copier-log|AUDIT_ENABLED=false|METRICS_ENABLED=true|GITHUB_APP_ID=${{ secrets.APP_ID }}|INSTALLATION_ID=${{ secrets.INSTALLATION_ID }}" \
+            --set-env-vars="^|^CONFIG_REPO_OWNER=grove-platform|CONFIG_REPO_NAME=github-copier|CONFIG_REPO_BRANCH=main|PEM_NAME=CODE_COPIER_PEM|WEBHOOK_SECRET_NAME=webhook-secret|MONGO_URI_SECRET_NAME=mongo-uri|WEBSERVER_PATH=/events|MAIN_CONFIG_FILE=.copier/main.yaml|USE_MAIN_CONFIG=true|DEPRECATION_FILE=deprecated_examples.json|COMMITTER_NAME=GitHub Copier App|COMMITTER_EMAIL=bot@mongodb.com|GOOGLE_CLOUD_PROJECT_ID=github-copy-code-examples|COPIER_LOG_NAME=code-copier-log|AUDIT_ENABLED=true|METRICS_ENABLED=true|OPERATOR_UI_ENABLED=true|OPERATOR_AUTH_REPO=grove-platform/github-copier|OPERATOR_REPO_SLUG=grove-platform/github-copier|LLM_PROVIDER=anthropic|LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic|LLM_MODEL=claude-haiku-4-5|ANTHROPIC_API_KEY_SECRET_NAME=anthropic-api-key|GITHUB_APP_ID=${{ secrets.APP_ID }}|INSTALLATION_ID=${{ secrets.INSTALLATION_ID }}" \
             --set-build-env-vars="VERSION=${{ steps.version.outputs.tag }}" \
             --tag="${{ steps.version.outputs.traffic_tag }}" \
             --max-instances=10 \

diff --git a/AGENT.md b/AGENT.md
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,35 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
 ## [Unreleased]
 
+### Added
+
+- **Operator UI — comprehensive writer + operator dashboard** at `/operator/` (`OPERATOR_UI_ENABLED=true`). Five tabs (Overview, Webhooks, Audit, Workflows, System), sticky status bar, dark mode, keyboard shortcuts, shareable URLs, and a writer/operator mode toggle persisted to localStorage.
+- **GitHub PAT authentication** — users sign in with their personal access token; role is derived from their permission on `OPERATOR_AUTH_REPO` (admin/maintain → operator, write/triage/read → writer). Operator actions (replay, release, AI settings) require an explicit admin or maintain grant, since most writers have `write` on the auth repo. Replay additionally enforces read access on the source repo for that specific delivery.
+- **AI rule suggester** — paste a source path and desired target state, receive a suggested workflow rule with self-verification via the in-process `PatternMatcher`. Two providers supported:
+  - **Anthropic (hosted)** — default for Cloud Run. API key loaded from Secret Manager via `ANTHROPIC_API_KEY_SECRET_NAME`. No infra required; operators switch between Haiku / Sonnet / Opus from the UI.
+  - **Ollama (local)** — for dev or self-hosted deployments. UI manages connection, model pulls, deletes, and active-model switching without a redeploy.
+- **Writer-facing features** — workflow browser with per-rule coverage, PR lookup by URL, recent copies feed, file match tester (with clear button and Python-style `(?P<name>)` regex translation for in-browser use), PR timeline, and in-app help overlay.
+- **Per-delivery log viewer** — context-tagged ring buffer captures logs per webhook delivery, surfaced in an audit drawer alongside the trace and outcome summary.
+- **Audit event enrichment** — `processed_ok` traces now include destination repo(s), files matched / uploaded / failed, and commit SHA.
+- **Startup banner** — Operator UI, auth repo, AI model, and AI base URL are now surfaced when the app boots (local and Cloud Run).
+
+### Changed
+
+- **MongoDB audit logging enabled in production** — the Cloud Run deploy previously forced `AUDIT_ENABLED=false`; it is now `true`, aligning with the v0.3.0 "enabled by default" change.
+- **Operator auth hardened** — token-based auth (`OPERATOR_UI_TOKEN`) removed entirely; GitHub PAT is the only supported mechanism. `OPERATOR_UI_ENABLED=true` now requires `OPERATOR_AUTH_REPO` at config load (validated in `validateOperatorAuth`).
+- **`createPullRequest` skipped for empty commits** — `commitFilesToBranch` now returns an `errTreeUnchanged` sentinel so `addFilesViaPR` no longer calls the GitHub PR API with an unchanged tree (previously 422'd).
+- **MongoDB driver v2 ObjectID decoding** — audit reads set `ObjectIDAsHexString: true` to avoid "error decoding key `_id`" on queries.
+
+### Fixed
+
+- **gosec G107 / G704 SSRF findings** — GitHub API URL construction in `services/operator_auth.go` now validates path components against strict RE2-compatible whitelists (`ghUsernameRe`, `ghRepoNameRe`) and escapes them with `url.PathEscape` before request construction; `slack_notifier.go` `#nosec` annotation extended to cover `NewRequestWithContext`.
+- **Keyboard-shortcut overlay wouldn't close** — `.help-bg[hidden]` now wins over the base `display:flex`.
+- **File match tester returned no matches for Java files** — JavaScript `RegExp` does not support Python-style `(?P<name>)` named groups; the tester now rewrites `(?P<` → `(?<` before compilation.
+
+### Security
+
+- **Token auth removed** — the operator UI no longer accepts a shared bearer token; all access is per-user via GitHub PAT with repo-scoped permission checks.
+
 ## [v0.3.0] - 2026-04-14
 
 ### Changed

diff --git a/README.md b/README.md
@@ -29,6 +29,13 @@ A GitHub app that automatically copies code examples and files from source repos
 - **Development Tools** - Dry-run mode, CLI validation, enhanced logging
 - **Thread-Safe** - Concurrent webhook processing with proper state management
 
+### Operator UI
+- **Web dashboard at `/operator/`** - Five-tab UI (Overview, Webhooks, Audit, Workflows, System) with dark mode, keyboard shortcuts, and shareable URLs
+- **GitHub PAT authentication** - Users sign in with their personal access token; role is derived from their permission on a configured auth repo (`admin`/`maintain` → operator, `write`/`triage`/`read` → writer)
+- **Per-repo replay authorization** - Replay requires the caller's PAT to have read access to the source repo of the webhook being replayed
+- **Writer-facing tools** - Workflow browser, PR lookup, recent copies feed, file match tester, audit drawer, per-delivery log viewer
+- **AI rule suggester** - Paste a source/target pair; get a generated copier rule self-verified against the in-process pattern matcher. Two providers: [Anthropic](https://www.anthropic.com/) (hosted, default in prod via the Grove Foundry APIM gateway) or [Ollama](https://ollama.com) (local, for dev)
+
 ## 🚀 Quick Start
 
 ### Prerequisites
@@ -385,6 +392,47 @@ Get performance metrics:
 curl http://localhost:8080/metrics
 ```
 
+## Operator UI
+
+The operator UI is a web dashboard served from `/operator/` for diagnosing webhook processing, replaying failed deliveries, browsing workflows, and generating copier rules with AI assistance.
+
+### Enabling the UI
+
+Set the required env vars:
+
+```yaml
+OPERATOR_UI_ENABLED: "true"
+OPERATOR_AUTH_REPO: "your-org/some-repo"  # user permissions here determine role
+OPERATOR_REPO_SLUG: "your-org/some-repo"  # optional; enables audit-row deep links
+```
+
+**Startup fails** if `OPERATOR_UI_ENABLED=true` without `OPERATOR_AUTH_REPO` — this prevents an accidentally-open operator UI.
+
+### Authentication and roles
+
+Each user authenticates with their own **GitHub Personal Access Token**. Paste the PAT into the sign-in prompt; the server checks the user's permission on `OPERATOR_AUTH_REPO` and assigns a role:
+
+| GitHub permission | Operator UI role | Can do |
+|---|---|---|
+| `admin` / `maintain` | **operator** | View everything; replay deliveries; cut release tags; change AI settings |
+| `write` / `triage` / `read` | **writer** | View workflows, audit, recent copies, file match tester, AI rule suggester |
+| None | **denied** | 401 Unauthorized |
+
+`write` maps to writer (not operator) so typical docs contributors with repo write access can't replay deliveries or cut releases — those need an explicit `admin` / `maintain` grant.
+
+On top of the role, **replay is repo-scoped**: the user's PAT must also have read access to the source repo of the webhook being replayed.
+
+### AI rule suggester
+
+The operator UI includes an LLM-backed helper that takes a source/target file pair and returns a generated copier workflow rule, self-verified against the in-process pattern matcher before display.
+
+Two providers are supported via `LLM_PROVIDER`:
+
+- **`anthropic`** (default in Cloud Run): calls the Anthropic Messages API. For MongoDB deployments this routes through the Grove Foundry APIM gateway — set `LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic` and load the gateway key from Secret Manager via `ANTHROPIC_API_KEY_SECRET_NAME`.
+- **`ollama`** (default for local dev): runs against a local Ollama instance at `http://localhost:11434`. Connect, pull models, and switch the active model from the UI's System → AI settings panel without a redeploy.
+
+Smoke-test the LLM provider end-to-end with [`cmd/test-llm`](cmd/test-llm/README.md).
+
 ## Audit Logging
 
 When enabled, all operations are logged to MongoDB:
@@ -598,4 +646,6 @@ See [DEPLOYMENT.md](./docs/DEPLOYMENT.md) for the complete deployment and rollba
 
 - **[Config Validator](cmd/config-validator/README.md)** - CLI tool for validating configs
 - **[Test Webhook](cmd/test-webhook/README.md)** - CLI tool for testing webhooks
+- **[Test PEM](cmd/test-pem/README.md)** - CLI tool for verifying the GitHub App private key
+- **[Test LLM](cmd/test-llm/README.md)** - CLI tool for smoke-testing the AI rule suggester's LLM provider
 - **[Scripts](scripts/README.md)** - Helper scripts for deployment, testing, and releases
diff --git a/app.go b/app.go
@@ -63,6 +63,15 @@ func main() {
 		os.Exit(1)
 	}
 
+	// Anthropic API key is only needed when the operator UI's AI suggester uses
+	// the anthropic provider. Failure to load is non-fatal — the UI will show
+	// "not configured" and writers can still use every other feature.
+	if config.OperatorUIEnabled && config.LLMProvider == "anthropic" {
+		if err := services.LoadAnthropicAPIKey(ctx, config); err != nil {
+			fmt.Printf("⚠️  Anthropic API key not loaded: %v (AI suggester will be disabled)\n", err)
+		}
+	}
+
 	// Override dry-run from command line
 	if dryRun {
 		config.DryRun = true
@@ -136,15 +145,35 @@ func printBanner(config *configs.Config, container *services.ServiceContainer) {
 	fmt.Printf("║  Version:      %-48s║\n", version)
 	fmt.Printf("║  Port:         %-48s║\n", config.Port)
 	fmt.Printf("║  Webhook Path: %-48s║\n", config.WebserverPath)
-	fmt.Printf("║  Config File:  %-48s║\n", config.EffectiveConfigFile())
+	fmt.Printf("║  Config File:  %-48s║\n", truncMiddle(config.EffectiveConfigFile(), 48))
 	fmt.Printf("║  Dry Run:      %-48v║\n", config.DryRun)
 	fmt.Printf("║  Audit Log:    %-48v║\n", config.AuditEnabled)
 	fmt.Printf("║  Metrics:      %-48v║\n", config.MetricsEnabled)
 	fmt.Printf("║  Slack:        %-48v║\n", config.SlackEnabled)
+	fmt.Printf("║  Operator UI:  %-48v║\n", config.OperatorUIEnabled)
+	if config.OperatorUIEnabled {
+		fmt.Printf("║    Auth Repo:  %-48s║\n", truncMiddle(config.OperatorAuthRepo, 48))
+		fmt.Printf("║    AI Provider:%-48s║\n", truncMiddle(config.LLMProvider, 48))
+		fmt.Printf("║    AI Model:   %-48s║\n", truncMiddle(config.LLMModel, 48))
+		fmt.Printf("║    AI URL:     %-48s║\n", truncMiddle(config.LLMBaseURL, 48))
+	}
 	fmt.Println("╚════════════════════════════════════════════════════════════════╝")
 	fmt.Println()
 }
 
+// truncMiddle shortens s to max bytes, replacing the middle with "..." when
+// too long. Uses ASCII so Go's byte-count-based %-Ns padding stays aligned.
+func truncMiddle(s string, max int) string {
+	if len(s) <= max {
+		return s
+	}
+	if max < 6 {
+		return s[:max]
+	}
+	keep := (max - 3) / 2
+	return s[:keep] + "..." + s[len(s)-(max-3-keep):]
+}
+
 func validateConfiguration(container *services.ServiceContainer) error {
 	ctx := context.Background()
 	_, err := container.ConfigLoader.LoadConfig(ctx, container.Config)
@@ -155,24 +184,22 @@ func startWebServer(config *configs.Config, container *services.ServiceContainer
 	// Create HTTP handler with all routes
 	mux := http.NewServeMux()
 
-	// Webhook endpoint
-	mux.HandleFunc(config.WebserverPath, func(w http.ResponseWriter, r *http.Request) {
-		handleWebhook(w, r, config, container)
-	})
-
-	// Liveness probe — lightweight, always 200 if process is running
+	// Register built-in paths before the configurable webhook route so a mis-set
+	// WEBSERVER_PATH can never shadow /health, /ready, /metrics, /config, or /operator.
 	mux.HandleFunc("/health", services.HealthHandler(container.StartTime, version))
-
-	// Readiness probe — checks GitHub auth, MongoDB connectivity
 	mux.HandleFunc("/ready", services.ReadinessHandler(container))
-
-	// Metrics endpoint (if enabled)
 	if config.MetricsEnabled {
 		mux.HandleFunc("/metrics", services.MetricsHandler(container.MetricsCollector, container.FileStateService))
 	}
-
-	// Config diagnostic endpoint — shows resolved config with secrets redacted
 	mux.HandleFunc("/config", services.ConfigDiagnosticHandler(container, version))
+	if config.OperatorUIEnabled {
+		services.RegisterOperatorRoutes(mux, config, container, version)
+	}
+
+	// GitHub webhook (configurable path, typically /events)
+	mux.HandleFunc(config.WebserverPath, func(w http.ResponseWriter, r *http.Request) {
+		handleWebhook(w, r, config, container)
+	})
 
 	// Info endpoint
 	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
@@ -189,6 +216,9 @@ func startWebServer(config *configs.Config, container *services.ServiceContainer
 		if config.MetricsEnabled {
 			_, _ = fmt.Fprintf(w, "Metrics: /metrics\n")
 		}
+		if config.OperatorUIEnabled {
+			_, _ = fmt.Fprintf(w, "Operator UI: /operator/ (authenticate with a GitHub PAT; role from %s)\n", config.OperatorAuthRepo)
+		}
 	})
 
 	// Create server

diff --git a/cmd/test-llm/README.md b/cmd/test-llm/README.md
@@ -0,0 +1,79 @@
+# test-llm
+
+Smoke-test the operator UI's LLM client against the configured provider.
+
+## Purpose
+
+Verify end-to-end that:
+
+- The provider URL and API key are reachable from your machine
+- Auth headers are accepted (direct Anthropic API or APIM-fronted gateway)
+- The active model responds to a real rule-suggester prompt and returns valid JSON
+
+Useful after rotating `ANTHROPIC_API_KEY`, changing `LLM_BASE_URL`, or pointing at a new gateway.
+
+## Build
+
+```bash
+go build -o test-llm ./cmd/test-llm
+```
+
+## Usage
+
+```bash
+./test-llm [-env <path>] [-timeout <duration>]
+```
+
+The tool reads standard env vars — `LLM_PROVIDER`, `LLM_BASE_URL`, `LLM_MODEL`, `ANTHROPIC_API_KEY` — from the process environment. Use `-env` to load a `.env`-style file first. Inline env vars on the command line override file values.
+
+## Examples
+
+Smoke-test against the local `.env.test`:
+
+```bash
+./test-llm -env .env.test
+```
+
+Override the key without editing the env file:
+
+```bash
+ANTHROPIC_API_KEY='sk-...' ./test-llm -env .env.test
+```
+
+Test Ollama locally:
+
+```bash
+LLM_PROVIDER=ollama LLM_BASE_URL=http://localhost:11434 LLM_MODEL=qwen2.5-coder:7b ./test-llm
+```
+
+## Output
+
+On success:
+
+```
+Provider: anthropic
+Base URL: https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic
+Model:    claude-haiku-4-5
+API key:  sk-a…xyz9
+
+✅ Ping OK
+✅ ListModels: 3 models
+   - claude-opus-4-7
+   - claude-sonnet-4-6
+   - claude-haiku-4-5-20251001
+✅ GenerateJSON parsed OK:
+   {
+     "transform_type": "move",
+     "transform_from": "agg/python/models",
+     ...
+   }
+
+🎉 All checks passed — the LLM provider is reachable and usable.
+```
+
+## Exit Codes
+
+| Code | Meaning                              |
+|------|--------------------------------------|
+| 0    | All checks passed                    |
+| 1    | Any failure (auth, network, parsing) |