fix: env-file ownership + response panic recovery + oauth header injection by nnemirovsky · Pull Request #38 · nnemirovsky/sluice

nnemirovsky · 2026-05-07T08:24:36Z

Summary

Production-deployment fixes that turned up while validating v0.13.1 against a live Hermes deployment running through sluice.

1. Chown env file back to the agent runtime user (the original motivation)

Sluice writes phantom tokens into the agent's env file via docker exec running the image's USER (root for the upstream openclaw and hermes images). The result was a root-owned env file, blocking the agent's own write paths — most visibly, hermes claw migrate failed with Permission denied after a sluice restart. The script now stat | chowns the file back to the parent directory's owner, so a non-root agent can keep editing the file. Works on both GNU and BSD stat (Linux containers + tart macOS guests) via a -c then -f fallback.

2. Response handler panic recovery

The OpenAI Codex device-code OAuth flow surfaced a runtime error: invalid memory address or nil pointer dereference panic from somewhere downstream of the OAuth response handler. go-mitmproxy's outer recover swallowed it, abandoned the response body, and the agent received empty bytes — Hermes failed with JSONDecodeError even though sluice had successfully phantom-swapped the tokens.

Response and StreamResponseModifier now have deferred recover guards that log the full stack trace and fall back to safe defaults. Real tokens cannot leak — the OAuth handler has its own snapshot/rollback that runs BEFORE the panic, so f.Response.Body is either fully phantom-swapped or untouched at the time of the recover. StreamResponseModifier also gets a defensive nil-input guard plus a buffered-body capture so a panic after io.ReadAll drains the input still hands the agent the buffered bytes (rather than an empty reader from the now-drained stream).

3. OAuth-vs-static dispatch for header injection (HTTP/1+2 and HTTP/3)

Reproduced live: when an OAuth-type credential has a header binding with a {value} template, the previous code was substituting the entire JSON envelope into the header (Bearer {"access_token":"<jwt>","refresh_token":"...","token_url":"..."}), and upstream APIs reject that with 401 "Could not parse your authentication token". This bug had been latent since the bidirectional OAuth phantom swap was introduced in v0.6.0; it only manifested for OAuth credentials with header-binding templates, and the typical pre-v0.6.0 setup used static credentials so it never surfaced.

Fix: a new extractInjectableSecret helper consults the OAuthIndex (rebuilt from credential_meta on startup and SIGHUP) to dispatch by credential metadata type, not by JSON shape inference. Static credentials whose value happens to be OAuth-shaped JSON are NOT misclassified. OAuth credentials parse via vault.ParseOAuth and yield just access_token. Server.UpdateOAuthIndex now mirrors the index into the QUIC proxy too so HTTP/3 header injection follows the same dispatch as HTTP/1+2; the index is built once during setup and applied to both the addon and the QUIC proxy after both are initialized so the startup ordering bug from the first iteration is gone.

Test plan

go build ./...
go test ./... — 2447 passed across 13 packages, including new focused tests for: chown step under both content branches, panic recovery in Response (test injection point), panic recovery in StreamResponseModifier (nil input + buffered body), extractInjectableSecret dispatching on metadata for static / OAuth-shaped-static / oauth-with-leading-whitespace / corrupted-payload / nil-index cases.
gofumpt -l clean
golangci-lint run ./... clean
live OAuth flow against auth.openai.com and auth.anthropic.com returns a phantom-swapped body without the JSONDecodeError or Could not parse your authentication token failures that happened under v0.13.1.
live OAuth-managed Codex (gpt-5.5) and Claude Code (Pro/Max OAuth) both confirmed end-to-end through the deployed sluice on the production host.
CI: test, e2e-linux, e2e-macos, gofumpt, golangci-lint, openapi

Sluice's docker exec runs the env-injection script as the image's USER (root for the upstream openclaw and hermes images). The awk pre-pass renames the file (new file inherits the writer's UID) and the heredoc append writes the new managed block; both leave the file owned by root. The agent runtime later runs as a non-root user (UID 10000 for the upstream hermes image, configurable via HERMES_UID), and a root-owned env file blocks agent-side write paths like `hermes claw migrate` or in-agent secret edits with EACCES. Inheriting the parent directory's owner is portable: every agent's entrypoint chowns its data dir to its runtime user before sluice ever exec's in. The chown is best-effort — a `stat` or `chown` failure does not abort the script (a read-only env file still satisfies sluice's own usage), so containers without GNU stat degrade gracefully. The chown step is appended on the SAME line as the heredoc `<<` operator with `&&` short-circuiting, because shell will not let `&&` at the start of a new line attach to a command whose heredoc body sits in between. New unit test asserts the chown is present both when the managed block has content and when it is empty.

Copilot

Pull request overview

Fixes agent env-file permission regressions caused by docker exec running as the image USER (often root), which can leave ~/<profile>/.env root-owned and block agent-side writes after restart (e.g., hermes claw migrate).

Changes:

Appends a best-effort “derive parent dir owner → chown env file” step to the generated env injection script.
Ensures the ownership fix runs even when no managed env entries are emitted (empty managed block case).
Adds a focused unit test asserting the script contains the stat + chown step for both “has values” and “no values” cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
internal/container/types.go	Adds a best-effort ownership correction step to the generated env injection script (including empty-block scenario).
internal/container/agent_profile_test.go	Adds a test to assert the generated script includes the `stat` + `chown` ownership fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Three fixes that turned up while validating v0.13.1's OAuth flow on the live deployment. - proxy/addon: add a top-level recover in Response and StreamResponseModifier with full stack capture. The OAuth Codex device-code flow surfaced a "nil pointer dereference" panic from somewhere downstream that go-mitmproxy's outer recover swallowed, leaving the agent reading an empty response body. The panic source itself remains unidentified, but escaping it into go-mitmproxy means losing the response — both handlers now log the stack and fall back to safe defaults (Response leaves f.Response unchanged at the panic point; StreamResponseModifier hands back the original input reader). Real tokens cannot leak: any panic in Response runs after processOAuthResponseIfMatching's own snapshot/rollback, so f.Response.Body is either fully phantom-swapped or untouched at the time of the panic. - proxy/addon: defensive nil check on the input reader for StreamResponseModifier. go-mitmproxy normally passes a non-nil reader, but a nil here causes io.LimitReader + io.ReadAll to nil-deref on the first Read call. When in is nil we now return nil, which makes go-mitmproxy's reply() skip its copyStream branch and fall through to writing f.Response.Body (already phantom-swapped by the Response handler). - container/types: portable stat fallback. The chown step from the earlier commit only worked under GNU stat (Linux containers alpine/debian); BSD stat (macOS guests booted via the tart backend) uses `-f`. We now try `-c` first and fall back to `-f` on non-zero exit so the chown succeeds across both userlands.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

- Add a test-only responsePanicHook field to SluiceAddon. When non-nil it fires between processOAuthResponseIfMatching and the DLP scan, letting tests force the downstream-of-OAuth panic shape we observed in production without engineering a malformed Flow that triggers a real nil deref. Always nil in production. - TestAddonResponse_RecoversFromDownstreamPanic uses the hook to panic between OAuth swap and DLP scan, then asserts: the test itself doesn't see the panic propagate, real tokens stayed out of f.Response.Body, and the phantom token from the swap is still present. - TestAddonStreamResponseModifier_HandlesNilInput passes nil to StreamResponseModifier and asserts no panic + the handler returns nil so go-mitmproxy's reply() falls through to writing f.Response.Body. - Drop the line-broken hyphen in the Response handler's comment ("processOAuthResponseIfMatching" was wrapping with a hyphen inside the identifier).

OAuth credentials are stored in the vault as JSON-marshalled OAuthCredential structs (access_token + refresh_token + token_url + expires_at). The header-injection path in addon.injectHeader and quic.go was substituting the binding's `{value}` placeholder with the entire JSON blob, producing requests like Authorization: Bearer {"access_token":"<jwt>","refresh_token":...} that upstream APIs reject with 401 "Could not parse your authentication token" (reproduced live against chatgpt.com via the OpenAI Codex device-code OAuth flow). Static credentials are stored as plain strings so they were unaffected; the bug only manifested for OAuth-type credentials with a header binding that uses a template. This bug has been latent since the bidirectional OAuth phantom swap was introduced (commit 0be82d4); before that, OAuth tokens were typically loaded into the vault as raw strings via `sluice cred update` and the JSON envelope shape never reached the injection path. The buildPhantomPairs / phantom-swap path was already OAuth-aware via vault.IsOAuth, so phantom-token replace in the request body kept working; only the header-injection path was broken. Fix: detect a JSON envelope by shape (first byte `{`, parses as OAuthCredential, has non-empty access_token) and substitute just the access_token. Static credentials and any other plain string pass through unchanged. Both the buffered-HTTP path (internal/proxy/addon.go) and the QUIC/HTTP3 path (internal/proxy/quic.go) call the same helper. New test asserts pass-through for static/empty/malformed values and access_token extraction for an OAuth blob.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

The previous extractInjectableSecret inferred OAuth-vs-static from the secret's JSON shape (first byte `{`, parses as OAuthCredential). Two issues with that: - A static credential whose value happened to be OAuth-shaped JSON (e.g. an upstream that already stores access_token+token_url) would be silently treated as OAuth, with only access_token injected. Other code paths in sluice already key on credential_meta.cred_type to avoid this exact misclassification. - Leading whitespace or newlines before the brace bypassed the shape check entirely, even though json.Unmarshal would still parse it. The OAuth envelope would round-trip through the injection unchanged. Fix: dispatch on the credential's metadata type via the OAuthIndex.Has(credName) lookup. The OAuthIndex is rebuilt from credential_meta on startup and SIGHUP, so the injection path now asks the same source-of-truth as the response interceptor. A new free function extractInjectableSecret(idx, name, secret) takes the index explicitly so the QUIC path (which carries its own pointer) shares the same logic with the addon. Also wires the QUIC proxy with its own oauthIndex pointer plus a SetOAuthIndex setter; Server.UpdateOAuthIndex now mirrors index updates into both the addon and the QUIC proxy so HTTP/3 header injection follows the same dispatch rules as HTTP/1+2. Test coverage exercises: - Static cred with plain string -> pass-through. - Static cred whose value is OAuth-shaped JSON -> still pass-through (because it's not in the OAuth index). - OAuth cred with normal JSON -> access_token extraction. - OAuth cred with leading whitespace -> still extracts access_token (json.Unmarshal handles whitespace). - OAuth cred with corrupted vault payload -> falls back to raw secret with a diagnostic log line. - nil index -> every credential treated as static (early-startup path before UpdateOAuthIndex fires).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

- proxy/addon: extractInjectableSecret log uses generic [INJECT] prefix because the helper now serves both the HTTP/1+2 addon path (internal/proxy/addon.go) and the HTTP/3 (QUIC) path (internal/proxy/quic.go). [ADDON-INJECT] would mislead a reader who saw the line in a deployment that uses HTTP/3 exclusively. - proxy/addon: StreamResponseModifier now returns http.NoBody (an empty reader) instead of nil when the input reader is nil. The nil case only fires under f.Stream=true (where the Response addon callback was skipped, so f.Response.Body is empty), so a literal nil return left reply() with no body source AND no body bytes — undefined-length responses trip some HTTP clients. http.NoBody is well-framed: reply() copies its EOF immediately and the agent sees a zero-byte body. The nil-input test was updated to assert the new behavior.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

The deferred recover on StreamResponseModifier previously fell back to `out = in` on panic. After io.ReadAll consumes the input stream (line 901), `in` is drained — a panic later in swapOAuthTokens or the body-replacement steps would hand the agent an empty reader even though we already had the bytes in memory. Capture the bytes into a closure-scoped `bufferedBody` slice immediately after the read completes. The recover prefers bufferedBody if it is set, otherwise falls back to `in` (the read hasn't started yet, stream is intact). Either way the agent gets a usable body.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Two PR #38 round 6 follow-ups. Server now caches the latest credential_meta slice it saw via UpdateOAuthIndex. setupInjection re-applies the cache to the QUIC proxy right after assigning s.quicProxy, so an UpdateOAuthIndex call that arrives before the proxy exists is not silently dropped. Without the cache, a future reorder of the startup sequence would re-introduce the OAuth-envelope header injection bug for HTTP/3 traffic. Response addon also gets a nil-flow guard mirroring StreamResponseModifier so the deferred recover does not dereference a nil flow when tests build a minimal Response addon path.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copilot round 7: the deferred recover in StreamResponseModifier returned the buffered raw upstream bytes when a panic fired after io.ReadAll. For an OAuth-matched token endpoint those bytes contain real access and refresh tokens, so a panic in swapOAuthTokens (or anywhere between the read and the swap) would have leaked real credentials to the agent. Replace bufferedBody with safeFallback. We assign it ONLY after a successful swap, when the buffer is the phantom-only modified body. A panic before that point falls back to http.NoBody. The agent sees an empty 2xx token response and surfaces the failure as a parse error, which is strictly preferable to handing over a real bearer. The non-panic error path (swap parse failure on a non-OAuth body, e.g. an HTML error page from a misconfigured token endpoint) keeps the historical pass-through behavior but no longer assigns safeFallback so a late panic would still hit http.NoBody rather than echo whatever that body was.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copilot round 8: the nil-input guard ran AFTER the early return for nil flow / nil request, so a call where both `f` and `in` were nil would skip the guard, leave `out = in` (nil), and trip a nil io.Reader nil-deref in go-mitmproxy's downstream copy. Move the nil-input check to the top of the function so http.NoBody is returned regardless of the flow's shape.

Copilot