Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
575a7c8
feat(operator): add audit wiring and token-protected operator UI
cbullinger Apr 18, 2026
539d43c
feat(operator): comprehensive operator UI with writer-facing features
cbullinger Apr 20, 2026
58975f4
feat(operator): optional GitHub PAT auth with role + per-repo access …
cbullinger Apr 21, 2026
ef97673
refactor(operator): remove token auth mode, require GitHub PAT + auth…
cbullinger Apr 21, 2026
b9ad455
feat(operator): AI rule suggester with full UI-based Ollama management
cbullinger Apr 21, 2026
35ae635
security: fix gosec G107/G704 SSRF findings on PR branch
cbullinger Apr 21, 2026
82b1c52
chore(startup): show operator UI and AI settings in banner
cbullinger Apr 21, 2026
e4536da
chore(deploy): wire operator UI, enable audit, populate changelog
cbullinger Apr 21, 2026
f381804
docs: restore v0.3.0 heading in CHANGELOG
cbullinger Apr 21, 2026
0e863ee
feat(operator): Anthropic LLM provider for AI rule suggester
cbullinger Apr 21, 2026
4677718
fix(operator): route Anthropic client through Grove Foundry APIM gateway
cbullinger Apr 21, 2026
d5d284f
chore(tools): add test-llm smoke-test CLI
cbullinger Apr 21, 2026
89cb362
refactor(operator): require admin or maintain for operator role
cbullinger Apr 22, 2026
27adad3
feat(operator): harden AI suggester prompt with worked examples
cbullinger Apr 22, 2026
daae2c9
security(operator): fail closed on permission check; hash PATs in cache
cbullinger Apr 22, 2026
5e8ab67
perf(operator): rate-limit /suggest-rule and cache /llm/status ping
cbullinger Apr 22, 2026
9b23db5
review: address review-medium and polish items
cbullinger Apr 22, 2026
14c9d29
test(operator): cover verifySuggestedRule, llm client dispatch, helpers
cbullinger Apr 22, 2026
89ecf99
docs: document operator UI, PAT auth, and AI rule suggester
cbullinger Apr 22, 2026
338a39c
test: pass context.Background() instead of nil to LLM client
cbullinger Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ jobs:
--region $REGION \
--project $PROJECT_ID \
--allow-unauthenticated \
--set-env-vars="^|^CONFIG_REPO_OWNER=grove-platform|CONFIG_REPO_NAME=github-copier|CONFIG_REPO_BRANCH=main|PEM_NAME=CODE_COPIER_PEM|WEBHOOK_SECRET_NAME=webhook-secret|MONGO_URI_SECRET_NAME=mongo-uri|WEBSERVER_PATH=/events|MAIN_CONFIG_FILE=.copier/main.yaml|USE_MAIN_CONFIG=true|DEPRECATION_FILE=deprecated_examples.json|COMMITTER_NAME=GitHub Copier App|COMMITTER_EMAIL=bot@mongodb.com|GOOGLE_CLOUD_PROJECT_ID=github-copy-code-examples|COPIER_LOG_NAME=code-copier-log|AUDIT_ENABLED=false|METRICS_ENABLED=true|GITHUB_APP_ID=${{ secrets.APP_ID }}|INSTALLATION_ID=${{ secrets.INSTALLATION_ID }}" \
--set-env-vars="^|^CONFIG_REPO_OWNER=grove-platform|CONFIG_REPO_NAME=github-copier|CONFIG_REPO_BRANCH=main|PEM_NAME=CODE_COPIER_PEM|WEBHOOK_SECRET_NAME=webhook-secret|MONGO_URI_SECRET_NAME=mongo-uri|WEBSERVER_PATH=/events|MAIN_CONFIG_FILE=.copier/main.yaml|USE_MAIN_CONFIG=true|DEPRECATION_FILE=deprecated_examples.json|COMMITTER_NAME=GitHub Copier App|COMMITTER_EMAIL=bot@mongodb.com|GOOGLE_CLOUD_PROJECT_ID=github-copy-code-examples|COPIER_LOG_NAME=code-copier-log|AUDIT_ENABLED=true|METRICS_ENABLED=true|OPERATOR_UI_ENABLED=true|OPERATOR_AUTH_REPO=grove-platform/github-copier|OPERATOR_REPO_SLUG=grove-platform/github-copier|LLM_PROVIDER=anthropic|LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic|LLM_MODEL=claude-haiku-4-5|ANTHROPIC_API_KEY_SECRET_NAME=anthropic-api-key|GITHUB_APP_ID=${{ secrets.APP_ID }}|INSTALLATION_ID=${{ secrets.INSTALLATION_ID }}" \
--set-build-env-vars="VERSION=${{ steps.version.outputs.tag }}" \
--tag="${{ steps.version.outputs.traffic_tag }}" \
--max-instances=10 \
Expand Down
236 changes: 146 additions & 90 deletions AGENT.md

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,35 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

### Added

- **Operator UI — comprehensive writer + operator dashboard** at `/operator/` (`OPERATOR_UI_ENABLED=true`). Five tabs (Overview, Webhooks, Audit, Workflows, System), sticky status bar, dark mode, keyboard shortcuts, shareable URLs, and a writer/operator mode toggle persisted to localStorage.
- **GitHub PAT authentication** — users sign in with their personal access token; role is derived from their permission on `OPERATOR_AUTH_REPO` (admin/maintain → operator, write/triage/read → writer). Operator actions (replay, release, AI settings) require an explicit admin or maintain grant, since most writers have `write` on the auth repo. Replay additionally enforces read access on the source repo for that specific delivery.
- **AI rule suggester** — paste a source path and desired target state, receive a suggested workflow rule with self-verification via the in-process `PatternMatcher`. Two providers supported:
- **Anthropic (hosted)** — default for Cloud Run. API key loaded from Secret Manager via `ANTHROPIC_API_KEY_SECRET_NAME`. No infra required; operators switch between Haiku / Sonnet / Opus from the UI.
- **Ollama (local)** — for dev or self-hosted deployments. UI manages connection, model pulls, deletes, and active-model switching without a redeploy.
- **Writer-facing features** — workflow browser with per-rule coverage, PR lookup by URL, recent copies feed, file match tester (with clear button and Python-style `(?P<name>)` regex translation for in-browser use), PR timeline, and in-app help overlay.
- **Per-delivery log viewer** — context-tagged ring buffer captures logs per webhook delivery, surfaced in an audit drawer alongside the trace and outcome summary.
- **Audit event enrichment** — `processed_ok` traces now include destination repo(s), files matched / uploaded / failed, and commit SHA.
- **Startup banner** — Operator UI, auth repo, AI model, and AI base URL are now surfaced when the app boots (local and Cloud Run).

### Changed

- **MongoDB audit logging enabled in production** — the Cloud Run deploy previously forced `AUDIT_ENABLED=false`; it is now `true`, aligning with the v0.3.0 "enabled by default" change.
- **Operator auth hardened** — token-based auth (`OPERATOR_UI_TOKEN`) removed entirely; GitHub PAT is the only supported mechanism. `OPERATOR_UI_ENABLED=true` now requires `OPERATOR_AUTH_REPO` at config load (validated in `validateOperatorAuth`).
- **`createPullRequest` skipped for empty commits** — `commitFilesToBranch` now returns an `errTreeUnchanged` sentinel so `addFilesViaPR` no longer calls the GitHub PR API with an unchanged tree (previously 422'd).
- **MongoDB driver v2 ObjectID decoding** — audit reads set `ObjectIDAsHexString: true` to avoid "error decoding key `_id`" on queries.

### Fixed

- **gosec G107 / G704 SSRF findings** — GitHub API URL construction in `services/operator_auth.go` now validates path components against strict RE2-compatible whitelists (`ghUsernameRe`, `ghRepoNameRe`) and escapes them with `url.PathEscape` before request construction; `slack_notifier.go` `#nosec` annotation extended to cover `NewRequestWithContext`.
- **Keyboard-shortcut overlay wouldn't close** — `.help-bg[hidden]` now wins over the base `display:flex`.
- **File match tester returned no matches for Java files** — JavaScript `RegExp` does not support Python-style `(?P<name>)` named groups; the tester now rewrites `(?P<` → `(?<` before compilation.

### Security

- **Token auth removed** — the operator UI no longer accepts a shared bearer token; all access is per-user via GitHub PAT with repo-scoped permission checks.

## [v0.3.0] - 2026-04-14

### Changed
Expand Down
50 changes: 50 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ A GitHub app that automatically copies code examples and files from source repos
- **Development Tools** - Dry-run mode, CLI validation, enhanced logging
- **Thread-Safe** - Concurrent webhook processing with proper state management

### Operator UI
- **Web dashboard at `/operator/`** - Five-tab UI (Overview, Webhooks, Audit, Workflows, System) with dark mode, keyboard shortcuts, and shareable URLs
- **GitHub PAT authentication** - Users sign in with their personal access token; role is derived from their permission on a configured auth repo (`admin`/`maintain` → operator, `write`/`triage`/`read` → writer)
- **Per-repo replay authorization** - Replay requires the caller's PAT to have read access to the source repo of the webhook being replayed
- **Writer-facing tools** - Workflow browser, PR lookup, recent copies feed, file match tester, audit drawer, per-delivery log viewer
- **AI rule suggester** - Paste a source/target pair; get a generated copier rule self-verified against the in-process pattern matcher. Two providers: [Anthropic](https://www.anthropic.com/) (hosted, default in prod via the Grove Foundry APIM gateway) or [Ollama](https://ollama.com) (local, for dev)

## 🚀 Quick Start

### Prerequisites
Expand Down Expand Up @@ -385,6 +392,47 @@ Get performance metrics:
curl http://localhost:8080/metrics
```

## Operator UI

The operator UI is a web dashboard served from `/operator/` for diagnosing webhook processing, replaying failed deliveries, browsing workflows, and generating copier rules with AI assistance.

### Enabling the UI

Set the required env vars:

```yaml
OPERATOR_UI_ENABLED: "true"
OPERATOR_AUTH_REPO: "your-org/some-repo" # user permissions here determine role
OPERATOR_REPO_SLUG: "your-org/some-repo" # optional; enables audit-row deep links
```

**Startup fails** if `OPERATOR_UI_ENABLED=true` without `OPERATOR_AUTH_REPO` — this prevents an accidentally-open operator UI.

### Authentication and roles

Each user authenticates with their own **GitHub Personal Access Token**. Paste the PAT into the sign-in prompt; the server checks the user's permission on `OPERATOR_AUTH_REPO` and assigns a role:

| GitHub permission | Operator UI role | Can do |
|---|---|---|
| `admin` / `maintain` | **operator** | View everything; replay deliveries; cut release tags; change AI settings |
| `write` / `triage` / `read` | **writer** | View workflows, audit, recent copies, file match tester, AI rule suggester |
| None | **denied** | 401 Unauthorized |

`write` maps to writer (not operator) so typical docs contributors with repo write access can't replay deliveries or cut releases — those need an explicit `admin` / `maintain` grant.

On top of the role, **replay is repo-scoped**: the user's PAT must also have read access to the source repo of the webhook being replayed.

### AI rule suggester

The operator UI includes an LLM-backed helper that takes a source/target file pair and returns a generated copier workflow rule, self-verified against the in-process pattern matcher before display.

Two providers are supported via `LLM_PROVIDER`:

- **`anthropic`** (default in Cloud Run): calls the Anthropic Messages API. For MongoDB deployments this routes through the Grove Foundry APIM gateway — set `LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic` and load the gateway key from Secret Manager via `ANTHROPIC_API_KEY_SECRET_NAME`.
- **`ollama`** (default for local dev): runs against a local Ollama instance at `http://localhost:11434`. Connect, pull models, and switch the active model from the UI's System → AI settings panel without a redeploy.

Smoke-test the LLM provider end-to-end with [`cmd/test-llm`](cmd/test-llm/README.md).

## Audit Logging

When enabled, all operations are logged to MongoDB:
Expand Down Expand Up @@ -598,4 +646,6 @@ See [DEPLOYMENT.md](./docs/DEPLOYMENT.md) for the complete deployment and rollba

- **[Config Validator](cmd/config-validator/README.md)** - CLI tool for validating configs
- **[Test Webhook](cmd/test-webhook/README.md)** - CLI tool for testing webhooks
- **[Test PEM](cmd/test-pem/README.md)** - CLI tool for verifying the GitHub App private key
- **[Test LLM](cmd/test-llm/README.md)** - CLI tool for smoke-testing the AI rule suggester's LLM provider
- **[Scripts](scripts/README.md)** - Helper scripts for deployment, testing, and releases
56 changes: 43 additions & 13 deletions app.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,15 @@ func main() {
os.Exit(1)
}

// Anthropic API key is only needed when the operator UI's AI suggester uses
// the anthropic provider. Failure to load is non-fatal — the UI will show
// "not configured" and writers can still use every other feature.
if config.OperatorUIEnabled && config.LLMProvider == "anthropic" {
if err := services.LoadAnthropicAPIKey(ctx, config); err != nil {
fmt.Printf("⚠️ Anthropic API key not loaded: %v (AI suggester will be disabled)\n", err)
}
}

// Override dry-run from command line
if dryRun {
config.DryRun = true
Expand Down Expand Up @@ -136,15 +145,35 @@ func printBanner(config *configs.Config, container *services.ServiceContainer) {
fmt.Printf("║ Version: %-48s║\n", version)
fmt.Printf("║ Port: %-48s║\n", config.Port)
fmt.Printf("║ Webhook Path: %-48s║\n", config.WebserverPath)
fmt.Printf("║ Config File: %-48s║\n", config.EffectiveConfigFile())
fmt.Printf("║ Config File: %-48s║\n", truncMiddle(config.EffectiveConfigFile(), 48))
fmt.Printf("║ Dry Run: %-48v║\n", config.DryRun)
fmt.Printf("║ Audit Log: %-48v║\n", config.AuditEnabled)
fmt.Printf("║ Metrics: %-48v║\n", config.MetricsEnabled)
fmt.Printf("║ Slack: %-48v║\n", config.SlackEnabled)
fmt.Printf("║ Operator UI: %-48v║\n", config.OperatorUIEnabled)
if config.OperatorUIEnabled {
fmt.Printf("║ Auth Repo: %-48s║\n", truncMiddle(config.OperatorAuthRepo, 48))
fmt.Printf("║ AI Provider:%-48s║\n", truncMiddle(config.LLMProvider, 48))
fmt.Printf("║ AI Model: %-48s║\n", truncMiddle(config.LLMModel, 48))
fmt.Printf("║ AI URL: %-48s║\n", truncMiddle(config.LLMBaseURL, 48))
}
fmt.Println("╚════════════════════════════════════════════════════════════════╝")
fmt.Println()
}

// truncMiddle shortens s to max bytes, replacing the middle with "..." when
// too long. Uses ASCII so Go's byte-count-based %-Ns padding stays aligned.
func truncMiddle(s string, max int) string {
if len(s) <= max {
return s
}
if max < 6 {
return s[:max]
}
keep := (max - 3) / 2
return s[:keep] + "..." + s[len(s)-(max-3-keep):]
}

func validateConfiguration(container *services.ServiceContainer) error {
ctx := context.Background()
_, err := container.ConfigLoader.LoadConfig(ctx, container.Config)
Expand All @@ -155,24 +184,22 @@ func startWebServer(config *configs.Config, container *services.ServiceContainer
// Create HTTP handler with all routes
mux := http.NewServeMux()

// Webhook endpoint
mux.HandleFunc(config.WebserverPath, func(w http.ResponseWriter, r *http.Request) {
handleWebhook(w, r, config, container)
})

// Liveness probe — lightweight, always 200 if process is running
// Register built-in paths before the configurable webhook route so a mis-set
// WEBSERVER_PATH can never shadow /health, /ready, /metrics, /config, or /operator.
mux.HandleFunc("/health", services.HealthHandler(container.StartTime, version))

// Readiness probe — checks GitHub auth, MongoDB connectivity
mux.HandleFunc("/ready", services.ReadinessHandler(container))

// Metrics endpoint (if enabled)
if config.MetricsEnabled {
mux.HandleFunc("/metrics", services.MetricsHandler(container.MetricsCollector, container.FileStateService))
}

// Config diagnostic endpoint — shows resolved config with secrets redacted
mux.HandleFunc("/config", services.ConfigDiagnosticHandler(container, version))
if config.OperatorUIEnabled {
services.RegisterOperatorRoutes(mux, config, container, version)
}

// GitHub webhook (configurable path, typically /events)
mux.HandleFunc(config.WebserverPath, func(w http.ResponseWriter, r *http.Request) {
handleWebhook(w, r, config, container)
})

// Info endpoint
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
Expand All @@ -189,6 +216,9 @@ func startWebServer(config *configs.Config, container *services.ServiceContainer
if config.MetricsEnabled {
_, _ = fmt.Fprintf(w, "Metrics: /metrics\n")
}
if config.OperatorUIEnabled {
_, _ = fmt.Fprintf(w, "Operator UI: /operator/ (authenticate with a GitHub PAT; role from %s)\n", config.OperatorAuthRepo)
}
})

// Create server
Expand Down
79 changes: 79 additions & 0 deletions cmd/test-llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# test-llm

Smoke-test the operator UI's LLM client against the configured provider.

## Purpose

Verify end-to-end that:

- The provider URL and API key are reachable from your machine
- Auth headers are accepted (direct Anthropic API or APIM-fronted gateway)
- The active model responds to a real rule-suggester prompt and returns valid JSON

Useful after rotating `ANTHROPIC_API_KEY`, changing `LLM_BASE_URL`, or pointing at a new gateway.

## Build

```bash
go build -o test-llm ./cmd/test-llm
```

## Usage

```bash
./test-llm [-env <path>] [-timeout <duration>]
```

The tool reads standard env vars — `LLM_PROVIDER`, `LLM_BASE_URL`, `LLM_MODEL`, `ANTHROPIC_API_KEY` — from the process environment. Use `-env` to load a `.env`-style file first. Inline env vars on the command line override file values.

## Examples

Smoke-test against the local `.env.test`:

```bash
./test-llm -env .env.test
```

Override the key without editing the env file:

```bash
ANTHROPIC_API_KEY='sk-...' ./test-llm -env .env.test
```

Test Ollama locally:

```bash
LLM_PROVIDER=ollama LLM_BASE_URL=http://localhost:11434 LLM_MODEL=qwen2.5-coder:7b ./test-llm
```

## Output

On success:

```
Provider: anthropic
Base URL: https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic
Model: claude-haiku-4-5
API key: sk-a…xyz9

✅ Ping OK
✅ ListModels: 3 models
- claude-opus-4-7
- claude-sonnet-4-6
- claude-haiku-4-5-20251001
✅ GenerateJSON parsed OK:
{
"transform_type": "move",
"transform_from": "agg/python/models",
...
}

🎉 All checks passed — the LLM provider is reachable and usable.
```

## Exit Codes

| Code | Meaning |
|------|--------------------------------------|
| 0 | All checks passed |
| 1 | Any failure (auth, network, parsing) |
Loading
Loading