feat(operator): operator UI, GitHub PAT auth, AI rule suggester#7
Open
cbullinger wants to merge 20 commits intomainfrom
Open
feat(operator): operator UI, GitHub PAT auth, AI rule suggester#7cbullinger wants to merge 20 commits intomainfrom
cbullinger wants to merge 20 commits intomainfrom
Conversation
Wire MongoDB copy audit events after target uploads; add /operator dashboard with audit feed, deployment metadata, and optional tag API. Made-with: Cursor
Backend: - Add workflow config API (GET /operator/api/workflows) serving parsed config - Add webhook replay API (POST /operator/api/replay) with safety guards (in-flight dedup, merged-PR check, synthetic delivery ID) - Add per-delivery log viewer (GET /operator/api/logs) with context-tagged log capture via DeliveryLogBuffer ring buffer - Add PR number and file path filters to audit query API - Fix ObjectID decoding for mongo-driver v2 (ObjectIDAsHexString) - Fix constant-time token comparison leaking token length - Fix empty-commit PR creation bug (errTreeUnchanged sentinel) - Add CommitSHA to WebhookTraceEntry and CopierFileMeta - Enrich webhook trace detail with target repos and file counts - Add uptime + MongoDB health check to deployment API Frontend - operator UI rewrite: - 5-tab layout: Overview, Webhooks, Audit, Workflows, System - Writer/Operator mode toggle (hides infrastructure tabs for writers) - Sticky status bar: version, uptime, MongoDB health, last webhook, connection - Dark mode with full CSS variable theming - Collapsible sections with localStorage persistence - Grouped metrics with delta indicators, sparklines, health accent borders - Grouped deployment cards with copy-to-clipboard - Structured detail badges in webhook trace Detail column - Audit event detail drawer with all fields + replay button - File match tester: client-side glob/regex/move/copy matching against loaded workflow config with target path computation - PR lookup combining traces + audit into unified timeline view - Recent copies card-based feed - Workflow browser with search and visual cards - In-app help / getting started documentation - CSV export for audit events - Toast notifications for background auto-refresh errors - Connection heartbeat (60s ping with disconnected indicator) - Auto-refresh with persisted checkbox state - Shareable state URLs via history.replaceState - Deep-linkable filters, tabs, and mode (?tab=audit&pr=123&mode=writer) - Keyboard shortcuts (1-5 tabs, R refresh, D dark, T time, W mode, ? help) - Empty state messages, expandable errors, GitHub links throughout - Responsive audit table (hides SHA/Type columns on narrow screens) - Token show/hide toggle, release confirmation dialog, inline favicon - Relative/absolute time toggle with periodic update
…checks Adds a second authentication mode (OPERATOR_AUTH_MODE=github) that validates the bearer token as a GitHub Personal Access Token instead of a shared secret. The existing shared-token mode remains the default for backward compatibility. Backend: - New config: OPERATOR_AUTH_MODE (token|github), OPERATOR_AUTH_REPO - operator_auth.go: GitHub PAT validation with GET /user + per-user permission check on OPERATOR_AUTH_REPO. Maps GitHub permissions to operator/writer/denied roles. 5-minute cache keyed by PAT to avoid hitting GitHub on every request. - operator_ui.go: wrapAPI supports both auth modes, wrapOperatorOnly enforces the operator role on write endpoints (replay, release). - Per-repo permission check on POST /operator/api/replay — the user's PAT must have at least read access to the source repo being replayed (in github mode only; token mode unchanged). Cached per token+repo pair. - New endpoints: GET /operator/api/me returns the authenticated user + role; GET /operator/api/repo-permission?repos=... batch-checks repo read access for frontend button state. - /operator/api/status now includes auth_mode so the frontend can adapt. Frontend: - User chip in top bar shows GitHub avatar + username + role suffix - Dynamic token input label: "GitHub token" in github mode - body.role-writer hides elements with .operator-only class (replay, release) - Replay buttons pre-check per-repo access via fetchRepoPermissions and disable with an explanatory tooltip for inaccessible source repos - Token clear resets user state and permission cache
… repo Simplifies the operator UI authentication to a single path: GitHub PAT validation with role assignment from OPERATOR_AUTH_REPO permissions. Prior: two auth modes (shared secret via OPERATOR_UI_TOKEN, or GitHub PAT via OPERATOR_AUTH_MODE=github). The shared-secret mode had no user identity, no per-repo restrictions, and a single token that could be leaked. Now: OPERATOR_UI_ENABLED=true always uses GitHub PAT auth. Each user authenticates with their personal token, and OPERATOR_AUTH_REPO (required) determines their role (operator for write/admin, writer for read/triage). Config: - Remove OPERATOR_UI_TOKEN env var and Config.OperatorUIToken field - Remove OPERATOR_AUTH_MODE env var and Config.OperatorAuthMode field - validateOperatorAuth now requires OPERATOR_AUTH_REPO in owner/repo format when the UI is enabled — startup fails with a clear error otherwise Backend (operator_ui.go): - Remove operatorAuthOK() and crypto/subtle import (no shared secret to compare) - wrapAPI: always validate bearer token as GitHub PAT, attach user to context - wrapOperatorOnly: always enforce operator role - handleOperatorStatus: return auth_repo instead of auth_mode - handleMe: user always present in context; 500 if somehow missing - handleRepoPermission: always checks real permissions via ghCache - Replay handler: unconditionally enforces source-repo permission check - ghCache always initialized (no longer gated on mode) Frontend (index.html): - Token input label always "GitHub Personal Access Token" with auth repo shown - Remove conditional label/placeholder logic based on auth_mode - Status check reads auth_repo and displays it in the token input label Startup: - app.go root page shows "authenticate with a GitHub PAT; role from <repo>" - operator_ui.go startup log includes auth_repo
Adds an LLM-powered rule suggester in the Workflows tab. Given a source
file path and a desired target file path, the LLM proposes a copier
workflow rule and the server verifies the rule actually produces the
target before returning it to the user.
All Ollama management (connect/disconnect, list installed models,
pull/delete, switch active model, change base URL) happens from the UI.
No LLM_ENABLED env var or restart required — availability is detected
at runtime via an Ollama ping.
Backend:
- services/llm_client.go: pluggable LLMClient interface with an Ollama
implementation. Mutex-protected mutable state (active model, base URL)
so operators can switch models at runtime. Streams pull progress as
NDJSON, exposes ListModels/Ping/DeleteModel/PullModel.
- services/operator_suggest_rule.go: POST /operator/api/suggest-rule
prompts the LLM with a bounded schema (move/copy/glob/regex) and
verifies the generated rule against the user's source/target using the
existing PatternMatcher + PathTransformer. Returns YAML + explanation
+ verification status + computed_path so writers see exactly what the
rule would produce.
- services/operator_llm_admin.go: four admin endpoints
GET /operator/api/llm/status reachability, models, active model
POST /operator/api/llm/settings set active model / base URL at runtime
POST /operator/api/llm/pull stream NDJSON progress from Ollama
DELETE /operator/api/llm/model remove a model
Write operations require operator role.
- operator_ui.go: always instantiate the LLM client (no env gate), expose
llm_available via /operator/api/status.
- configs/environment.go: LLM_BASE_URL and LLM_MODEL are now optional
initial defaults; LLM_ENABLED is removed entirely.
Frontend (index.html):
- AI settings panel in the Workflows tab (operator-only): connection
status chip, installed-models list with delete buttons, active-model
selector, pull-model input with live streaming progress bar, editable
base URL. Shows install instructions with a link to ollama.com/download
when Ollama isn't reachable.
- AI rule suggester panel: source + target inputs (plus optional target
repo), generate button with long-running spinner, result card with
verification badge (green or red), computed path on failure, YAML
code block with copy-to-clipboard, and the LLM's explanation.
- Pre-fills the AI suggester's source from the file match tester if set.
Gosec CI was failing on 3 SSRF findings via taint analysis: operator_auth.go (2 findings): ghAPIGetRepoPermission constructs a GitHub API URL using the repo owner/name and username. While the host is hardcoded, gosec flags the user-derived path components. - Add strict input validation: ghUsernameRe and ghRepoNameRe whitelists enforce GitHub's username and repo name rules before URL construction (using RE2-compatible syntax — no lookahead) - Use url.PathEscape on path components as defense in depth - Add #nosec G107 G704 with justification on the Request and Do calls slack_notifier.go (1 finding): sendPayload had an existing nosec on the Do call but gosec now also flags NewRequestWithContext on the line above. - Extend the nosec annotation to cover the Request call No behavioral change — only input validation tightening and nosec coverage on calls where the URL is either hardcoded (GitHub API) or comes from trusted server config (Slack webhook).
Surface OPERATOR_UI_ENABLED, auth repo, AI model, and AI base URL in the startup banner so local dev runs make the active operator/AI configuration visible at a glance. Adds a truncMiddle helper (ASCII ellipsis) to keep long paths and URLs from breaking the banner's byte-count-aligned padding.
Prep the Cloud Run deploy for the operator UI + AI rule suggester work:
- CI deploy env vars (ci.yml):
- OPERATOR_UI_ENABLED=true
- OPERATOR_AUTH_REPO=grove-platform/github-copier (required when UI is on)
- OPERATOR_REPO_SLUG=grove-platform/github-copier (for audit-row deep links)
- AUDIT_ENABLED flipped false → true, aligning with the v0.3.0 "enabled by
default" CHANGELOG entry; the operator UI's Audit tab is inert without it
and MONGO_URI_SECRET_NAME is already wired.
- env-cloudrun.yaml: drop the stale OPERATOR_UI_TOKEN comment (token auth
mode was removed earlier on this branch) and document the PAT/auth-repo
model plus optional LLM_* overrides.
- CHANGELOG.md: populate [Unreleased] so release.sh has content to extract
when the next tag is cut. Covers the operator UI, PAT auth, AI rule
suggester, per-delivery logs, empty-commit fix, audit-decode fix, and the
gosec SSRF hardening.
LLM_* deliberately left unset in the deploy — defaults point at localhost,
which is unreachable from Cloud Run; operators can configure a real endpoint
from the UI at runtime.
The previous commit's edit accidentally dropped the '## [v0.3.0] - 2026-04-14' heading, leaving its Changed/Fixed/Security sections orphaned under [Unreleased]. Re-add the heading so the release history stays intact.
Adds an Anthropic Messages API client alongside the existing Ollama client so
the AI rule suggester works on Cloud Run without standing up a model-serving
VM. Ollama stays as the local/dev option.
Backend
- New services/llm_anthropic.go implementing LLMClient against /v1/messages
and /v1/models. Pins anthropic-version=2023-06-01 and sends a JSON-only
guardrail in the system prompt (Anthropic has no native JSON mode); strips
``` code fences from responses as a defensive post-process.
- llm_client.go: add ErrModelManagementNotSupported sentinel; change
NewLLMClient to take LLMClientOptions (provider, base URL, model, API key)
and dispatch anthropic/ollama.
- operator_llm_admin.go: include provider + supports_model_mgmt in the /llm/status
response so the UI can branch; map the sentinel to 400 on pull/delete.
- configs/environment.go: add AnthropicAPIKey and AnthropicAPIKeySecretName
fields + env vars; per-provider LLM_BASE_URL / LLM_MODEL defaults (Anthropic
defaults to api.anthropic.com + claude-haiku-4-5-20251001).
- github_auth.go: LoadAnthropicAPIKey follows the existing Secret Manager
pattern used for the webhook secret and Mongo URI.
- app.go: call LoadAnthropicAPIKey on boot (non-fatal — UI shows
"not configured" if missing); banner now prints AI Provider.
UI
- web/operator/index.html: renderAISettings branches on provider/
supports_model_mgmt to hide Ollama-only pull/delete sections for hosted
providers; separate onboarding text for each. Status line labels the
provider ("Anthropic connected at…" vs "Ollama connected at…").
Deploy config
- ci.yml + env-cloudrun.yaml: select anthropic, default to
claude-haiku-4-5-20251001, load the API key from Secret Manager via
ANTHROPIC_API_KEY_SECRET_NAME=anthropic-api-key. Operators can still switch
the active model from the UI at runtime.
Before deploy, a one-time step is needed on the prod project:
gcloud secrets create anthropic-api-key --data-file=<key-file>
gcloud secrets add-iam-policy-binding anthropic-api-key \
--member=serviceAccount:<runtime-sa> --role=roles/secretmanager.secretAccessor
The Anthropic access we have is an APIM-fronted gateway, not the native Anthropic API. Two changes needed to match the gateway's contract: - Auth header: gateway uses "api-key" (Azure API Management convention), not Anthropic's native "x-api-key". Send both — direct API ignores "api-key", gateway ignores "x-api-key", so one client works with either. - Ping: switch from GET /v1/models (not exposed by the gateway) to a minimal POST /v1/messages with max_tokens=1. Cost ≈ 1 input + 1 output token per refresh. ListModels already falls back to a static list when /v1/models fails, so this keeps the UI usable. Deploy config: - LLM_BASE_URL points at the Grove Foundry gateway. - LLM_MODEL uses the aliased "claude-haiku-4-5" (verified to work through the gateway with a smoke-test curl).
Small CLI for verifying LLM provider connectivity end-to-end — calls Ping, ListModels, and a real rule-suggester GenerateJSON against the configured provider. Useful after rotating ANTHROPIC_API_KEY, changing LLM_BASE_URL, or pointing at a new APIM gateway. Mirrors the shape of the existing cmd/test-pem and cmd/test-webhook tools.
Narrow the operator role mapping so write access no longer grants operator privileges. Most writers on docs repos have write on the auth repo, which would have given them replay and release capability — the point of the role split is to gate those actions behind an explicit grant. admin / maintain → operator (was: admin / maintain / write) write / triage / read → writer (was: read / triage) none → denied Updates env-cloudrun.yaml and CHANGELOG to match. Frontend has no hardcoded permission labels; it just branches on the role string, so no UI change.
Rewrite the rule-suggester system prompt around the actual single-rule output
shape (llmSuggestedRule): one transform_type, one rule per request. The
previous prompt described the types accurately but gave the model no concrete
examples to anchor the decision between move / copy / glob / regex.
New prompt adds:
- Explicit response-shape template with field-level "for move or copy" /
"for glob or regex" guidance.
- Default rules for destination_branch ("main") and commit_strategy
("pull_request" unless clearly a direct commit).
- Four worked Input/Output examples — one per transform type — using
realistic paths (mflix-java-spring prefix rename, mflix README copy,
agg/python glob with ${relative_path}, tutorials/v2 regex with named
captures).
Also exports the prompt as services.SuggestRuleSystemPrompt and wires it
into cmd/test-llm so the smoke test exercises the exact prompt writers hit
via the UI. Verified end-to-end against the Grove Foundry APIM gateway:
claude-haiku-4-5 now correctly picks "glob" with ${relative_path} for a
.py-in-directory example (previously chose "move" with the exact file path).
Two security fixes called out in PR review: 1. Soft-fail on permission check granted writer on 404 → auth bypass. Previously, ANY error from /repos/.../collaborators/.../permission (including GitHub's 404 for "not a collaborator") returned the default writer role. Combined with --allow-unauthenticated on the Cloud Run service, any valid GitHub PAT got read access to audit logs, webhook traces, workflows, logs, and the AI suggester (real token cost). Now: only transient 5xx responses keep the writer role (so a GitHub outage doesn't lock every operator out). 404/401/403/network/parse errors all surface an error AND set RoleDenied. The distinction is carried by a new ghAPIError type that exposes StatusCode and IsTransient(). 2. Raw PATs were live in the heap as cache keys for the 5-min TTL. A memory dump of the running Cloud Run instance would have leaked every active operator's token. Cache now keys on SHA-256 hex digests via hashToken(). Cache API unchanged — hashing happens at the boundary. Also introduces githubAPIBaseURL (package var) so tests can point the GitHub client at an httptest.Server. Not user-controllable, so SSRF posture is unchanged (gosec clean). Tests: full role-mapping matrix, explicit 404→denied and 5xx→writer cases, and an assertion that raw PATs never appear as map keys.
Two LLM cost controls flagged in PR review: 1. /operator/api/suggest-rule was unbounded per PAT — combined with the earlier soft-fail auth hole, any valid GitHub user could have spent Anthropic tokens at will. Adds a fixed-window rate limiter keyed by hashed PAT at 30 calls/hour/user. Denied requests return 429 with a Retry-After header. Normal usage is well under the cap. 2. /operator/api/llm/status was issuing a real /v1/messages call on every refresh (1 input + 1 output Anthropic token). Writers poll this tab for the connection indicator, so on a busy dashboard that added up. Cache the ping outcome for 30s; settings mutations (SetActiveModel / SetBaseURL) invalidate the entry so operators see fresh state after a change. tokenBucket and llmPingCache are small internal types in operator_ratelimit.go and operator_ui.go respectively. Eviction of rate-limit buckets is opportunistic (soft cap at 256) — matches the existing ghAuthCache pattern.
- handleRepoPermission: return {allowed, error?} per repo instead of a
raw bool map. The frontend now shows GitHub rate-limit / transient
errors in replay-button tooltips instead of generic "no access". UI
keeps boolean-shape compat for partial rollouts. (review #7)
- githubCreateVersionTag: apply the same ghUsernameRe / ghRepoNameRe
whitelist + url.PathEscape treatment ghAPIGetRepoPermission already
uses. Also add a ghBranchNameRe check. Env-vars only today, but keeps
the SSRF hardening story consistent across the package. (review #8)
- sharedGithubHTTPClient package var replaces per-call allocation in
githubHTTPClient(). (review minor)
- ReleaseAPIMode typed string with ReleaseAPIDisabled /
ReleaseAPITagCreateEnabled constants. (review minor)
- truncate() is now rune-aware so multi-byte LLM output isn't cut
mid-glyph in error logs. (review minor)
- Replay delivery IDs append a 3-byte random suffix so two replays in
the same millisecond can't collide on the trace ring. (review minor)
- Anthropic fallback model list trimmed to one stable alias
(claude-haiku-4-5). Listing rotating dated IDs here ships dead
dropdown options on every Claude release. (review #9)
- Normalize the Anthropic default model to the aliased form
("claude-haiku-4-5") in configs/environment.go and llm_anthropic.go
so every default path matches ci.yml and env-cloudrun.yaml. (review #10)
- AI settings UI hint calls out that settings are process-global,
affect all operators, and revert on restart. Fixes the "two operators
clobber each other silently" footgun. (review #5)
Addresses PR review #6 — the security-critical surface had zero tests. Adds table-driven coverage for: - verifySuggestedRule, the invariant the AI suggester depends on to decide whether to show a "not verified" warning. Covers every transform type (move/copy/glob/regex), pattern mismatches, target mismatches, and invalid regex. - NewLLMClient dispatch: empty/ollama/anthropic/unsupported, plus the "anthropic provider requires ANTHROPIC_API_KEY" guard. - anthropic client getters/setters and ErrModelManagementNotSupported returned by PullModel / DeleteModel. - stripJSONFences edge cases (fenced, unfenced, nested JSON). - Rune-aware truncate (multi-byte glyphs not cut mid-byte). The earlier operator_auth_test.go covers validateGitHubPAT role mapping and the 404→denied / 5xx→writer auth semantics; this completes the review's minimum bar.
Bring the doc set up-to-date with what this PR ships so new devs and operators aren't figuring out a live feature set from source. - README.md: new "Operator UI" section under Monitoring covering enable flags, role mapping (admin/maintain → operator, write/triage/read → writer), per-repo replay authorization, and AI-suggester providers. Enhanced Features list gains an "Operator UI" group. Tools list gains test-llm and test-pem entries. - docs/CONFIG-REFERENCE.md: new "Operator UI" and "AI Rule Suggester (LLM)" env-var tables covering OPERATOR_UI_ENABLED, OPERATOR_AUTH_REPO, OPERATOR_REPO_SLUG, OPERATOR_RELEASE_*, LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL, ANTHROPIC_API_KEY, ANTHROPIC_API_KEY_SECRET_NAME. Calls out the 30/hour/PAT rate limit on /suggest-rule. - docs/DEPLOYMENT.md: Secret Manager step #4 for anthropic-api-key plus the IAM binding; pre-deploy checklist gains the operator-UI auth repo bullet; post-deploy smoke test for the operator UI + AI settings. - docs/LOCAL-TESTING.md: "Optional (for Operator UI + AI rule suggester)" env-var block and a step-by-step "Testing the Operator UI Locally" section that points at cmd/test-llm for provider verification. - docs/FAQ.md: new "Operator UI" section (what it is, who can access, how the AI suggester works, how to debug "not connected"). - AGENT.md: full rewrite. Expanded file map covers all operator_*.go, llm_*.go, web/operator/index.html embed, webhook_trace_buffer, and log_buffer. New sections on authorization model, security posture (auth fail-closed, PAT hashing, SSRF defense-in-depth, LLM cost cap), and edit patterns for operator UI / LLM provider work. Key doc table rebuilt with clickable links.
CI staticcheck flagged SA1012 — passing nil as a context argument, even where the function signature permits it, is a lint error. My local golangci-lint run was pulling a cached rule set that didn't enforce this; CI's fresh download did. Swap the two nil args in TestAnthropicClient_ModelManagementNotSupported for a shared context.Background(). No behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds out the operator UI into a production-ready diagnostic + self-service tool, gated by GitHub PAT authentication with repo-derived roles, and adds an AI rule suggester backed by the hosted Anthropic API through the Grove Foundry APIM gateway.
Commits on
feat/operator-ui-audit→main:8f39997fe927fd80c5da98c1a76a27355843551a0bc8508e5473bdb362b6f33b75df04579edb9cmd/test-llmsmoke-test CLIConfiguration
OPERATOR_UI_ENABLEDtrueto mount/operator/routesOPERATOR_AUTH_REPOowner/repo— the user's permission on this repo determines their roleOPERATOR_REPO_SLUGOPERATOR_RELEASE_GITHUB_TOKENcontents:writeto create release tags from the UILLM_PROVIDERanthropic(default in prod) orollama(local)LLM_BASE_URLhttps://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropicLLM_MODELclaude-haiku-4-5for the gateway,qwen2.5-coder:7bfor OllamaANTHROPIC_API_KEYANTHROPIC_API_KEY_SECRET_NAMEStartup fails if
OPERATOR_UI_ENABLED=truebutOPERATOR_AUTH_REPOis missing — prevents an accidentally-open operator UI. MissingANTHROPIC_API_KEYis non-fatal: the UI shows "not configured" and every other feature keeps working.Authentication & authorization
Each user authenticates with their personal GitHub PAT. Their permission on
OPERATOR_AUTH_REPOdetermines their role:admin/maintainwrite/triage/readwriteis deliberately mapped to writer, not operator: most writers on docs repos havewriteon the auth repo, so mappingwrite → operatorwould give every writer replay and release capability. Operator actions require an explicitadminormaintaingrant.On top of the role, replay is repo-scoped: the user's PAT must also have read access to the source repo of the webhook being replayed. This prevents someone who has operator access to the copier from replaying webhooks for private source repos they can't read. Checked on every replay call and cached per (token, repo) with a 5-minute TTL.
What's new
Backend APIs
GET /operator/api/workflows— parsed copier config for the workflow browserPOST /operator/api/replay— replay a webhook delivery (in-flight dedup, merged-PR check, per-repo permission check)GET /operator/api/logs?delivery_id=X— per-delivery log viewer via context-tagged ring bufferGET /operator/api/me— authenticated user:{login, avatar_url, role}GET /operator/api/repo-permission?repos=...— batch permission check for frontend button statePOST /operator/api/suggest-rule— LLM-generated copier rule from a source→target example, self-verified against the in-processPatternMatcherGET /operator/api/llm/status— provider, reachability, active model, installed models,supports_model_mgmtflagPOST /operator/api/llm/settings— switch active model / base URL at runtime (in-memory, resets on restart)POST /operator/api/llm/pull(Ollama only) — NDJSON progress streaming for model pullsDELETE /operator/api/llm/model?name=(Ollama only) — remove a pulled modelGET /operator/api/audit/eventsCommitSHAadded toWebhookTraceEntryandCopierFileMetaAI rule suggester
LLMClientinterface with concreteollamaClientandanthropicClientimplementations.NewLLMClientdispatches onLLM_PROVIDER./v1/messagesthrough the Grove Foundry APIM gateway. Sends bothx-api-key(native Anthropic) andapi-key(Azure APIM) headers so the same client works with either endpoint. Ping uses a 1-token/v1/messagescall (the gateway doesn't expose/v1/models).ListModelsfalls back to a static list when the endpoint isn't reachable.PatternMatcherbefore display; mismatches surface a warning.Bug fixes
ObjectIDAsHexString) — audit queries work againerrTreeUnchangedsentinel) — skips cleanly instead of 422'ing on empty branches.help-bg[hidden]now wins over the basedisplay:flexRegExpdoesn't support Python-style(?P<name>); rewrites(?P<→(?<before compilationFrontend
?tab=audit&pr=123&mode=writer)1-5tabs,Rrefresh,Ddark mode,Ttime format,Wwriter/operator,?helpSecurity
ghUsernameRe,ghRepoNameRe) and escapes them withurl.PathEscapeslack_notifier.go#nosecannotation extended to coverNewRequestWithContextDeploy prep
.github/workflows/ci.yml— adds operator UI + AI env vars to the Cloud Run deploy, flipsAUDIT_ENABLEDtotrue(aligning with the v0.3.0 "enabled by default" CHANGELOG entry)env-cloudrun.yaml— parity with the CI deploy for the manual fallback pathCHANGELOG.md—[Unreleased]populated soscripts/release.sh vX.Y.Zcan extract release notes cleanlycmd/test-llm— smoke-test CLI to verify provider connectivity end-to-end after a key rotation or gateway changeOne-time deploy prep (before cutting the next tag)
Create the Anthropic gateway key secret on the prod GCP project and grant the Cloud Run runtime service account read access:
After merge,
./scripts/release.sh v0.4.0cuts the tag and the Cloud Run deploy picks up the new provider.Test plan
Local dev
OPERATOR_UI_ENABLED=truewithoutOPERATOR_AUTH_REPO— startup fails with a clear errorOPERATOR_UI_ENABLED=true+OPERATOR_AUTH_REPO=<repo>— server starts,/operator/loadscmd/test-llm -env .env.test; Ping, ListModels, and a real suggester prompt all succeedHosted (Cloud Run) — after merge
anthropic-api-keysecret and grant Secret Manager access to the runtime SA (see above)./scripts/release.sh v0.4.0→ verify Cloud Run deploys the new revisionhttps://<service>/operator/— all 5 tabs load with a valid PAT