fix: env-file ownership + response panic recovery + oauth header injection#38
Merged
Conversation
Sluice's docker exec runs the env-injection script as the image's USER (root for the upstream openclaw and hermes images). The awk pre-pass renames the file (new file inherits the writer's UID) and the heredoc append writes the new managed block; both leave the file owned by root. The agent runtime later runs as a non-root user (UID 10000 for the upstream hermes image, configurable via HERMES_UID), and a root-owned env file blocks agent-side write paths like `hermes claw migrate` or in-agent secret edits with EACCES. Inheriting the parent directory's owner is portable: every agent's entrypoint chowns its data dir to its runtime user before sluice ever exec's in. The chown is best-effort — a `stat` or `chown` failure does not abort the script (a read-only env file still satisfies sluice's own usage), so containers without GNU stat degrade gracefully. The chown step is appended on the SAME line as the heredoc `<<` operator with `&&` short-circuiting, because shell will not let `&&` at the start of a new line attach to a command whose heredoc body sits in between. New unit test asserts the chown is present both when the managed block has content and when it is empty.
There was a problem hiding this comment.
Pull request overview
Fixes agent env-file permission regressions caused by docker exec running as the image USER (often root), which can leave ~/<profile>/.env root-owned and block agent-side writes after restart (e.g., hermes claw migrate).
Changes:
- Appends a best-effort “derive parent dir owner → chown env file” step to the generated env injection script.
- Ensures the ownership fix runs even when no managed env entries are emitted (empty managed block case).
- Adds a focused unit test asserting the script contains the
stat+chownstep for both “has values” and “no values” cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| internal/container/types.go | Adds a best-effort ownership correction step to the generated env injection script (including empty-block scenario). |
| internal/container/agent_profile_test.go | Adds a test to assert the generated script includes the stat + chown ownership fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three fixes that turned up while validating v0.13.1's OAuth flow on the live deployment. - proxy/addon: add a top-level recover in Response and StreamResponseModifier with full stack capture. The OAuth Codex device-code flow surfaced a "nil pointer dereference" panic from somewhere downstream that go-mitmproxy's outer recover swallowed, leaving the agent reading an empty response body. The panic source itself remains unidentified, but escaping it into go-mitmproxy means losing the response — both handlers now log the stack and fall back to safe defaults (Response leaves f.Response unchanged at the panic point; StreamResponseModifier hands back the original input reader). Real tokens cannot leak: any panic in Response runs after processOAuthResponseIfMatching's own snapshot/rollback, so f.Response.Body is either fully phantom-swapped or untouched at the time of the panic. - proxy/addon: defensive nil check on the input reader for StreamResponseModifier. go-mitmproxy normally passes a non-nil reader, but a nil here causes io.LimitReader + io.ReadAll to nil-deref on the first Read call. When in is nil we now return nil, which makes go-mitmproxy's reply() skip its copyStream branch and fall through to writing f.Response.Body (already phantom-swapped by the Response handler). - container/types: portable stat fallback. The chown step from the earlier commit only worked under GNU stat (Linux containers alpine/debian); BSD stat (macOS guests booted via the tart backend) uses `-f`. We now try `-c` first and fall back to `-f` on non-zero exit so the chown succeeds across both userlands.
- Add a test-only responsePanicHook field to SluiceAddon. When
non-nil it fires between processOAuthResponseIfMatching and the
DLP scan, letting tests force the downstream-of-OAuth panic
shape we observed in production without engineering a malformed
Flow that triggers a real nil deref. Always nil in production.
- TestAddonResponse_RecoversFromDownstreamPanic uses the hook to
panic between OAuth swap and DLP scan, then asserts: the test
itself doesn't see the panic propagate, real tokens stayed out
of f.Response.Body, and the phantom token from the swap is
still present.
- TestAddonStreamResponseModifier_HandlesNilInput passes nil to
StreamResponseModifier and asserts no panic + the handler
returns nil so go-mitmproxy's reply() falls through to writing
f.Response.Body.
- Drop the line-broken hyphen in the Response handler's comment
("processOAuthResponseIfMatching" was wrapping with a hyphen
inside the identifier).
OAuth credentials are stored in the vault as JSON-marshalled
OAuthCredential structs (access_token + refresh_token + token_url +
expires_at). The header-injection path in addon.injectHeader and
quic.go was substituting the binding's `{value}` placeholder with
the entire JSON blob, producing requests like
Authorization: Bearer {"access_token":"<jwt>","refresh_token":...}
that upstream APIs reject with 401 "Could not parse your
authentication token" (reproduced live against chatgpt.com via
the OpenAI Codex device-code OAuth flow). Static credentials are
stored as plain strings so they were unaffected; the bug only
manifested for OAuth-type credentials with a header binding that
uses a template.
This bug has been latent since the bidirectional OAuth phantom
swap was introduced (commit 0be82d4); before that, OAuth tokens
were typically loaded into the vault as raw strings via `sluice
cred update` and the JSON envelope shape never reached the
injection path. The buildPhantomPairs / phantom-swap path was
already OAuth-aware via vault.IsOAuth, so phantom-token replace
in the request body kept working; only the header-injection path
was broken.
Fix: detect a JSON envelope by shape (first byte `{`, parses as
OAuthCredential, has non-empty access_token) and substitute just
the access_token. Static credentials and any other plain string
pass through unchanged. Both the buffered-HTTP path
(internal/proxy/addon.go) and the QUIC/HTTP3 path
(internal/proxy/quic.go) call the same helper.
New test asserts pass-through for static/empty/malformed values
and access_token extraction for an OAuth blob.
The previous extractInjectableSecret inferred OAuth-vs-static from
the secret's JSON shape (first byte `{`, parses as OAuthCredential).
Two issues with that:
- A static credential whose value happened to be OAuth-shaped JSON
(e.g. an upstream that already stores access_token+token_url)
would be silently treated as OAuth, with only access_token
injected. Other code paths in sluice already key on
credential_meta.cred_type to avoid this exact misclassification.
- Leading whitespace or newlines before the brace bypassed the
shape check entirely, even though json.Unmarshal would still
parse it. The OAuth envelope would round-trip through the
injection unchanged.
Fix: dispatch on the credential's metadata type via the
OAuthIndex.Has(credName) lookup. The OAuthIndex is rebuilt from
credential_meta on startup and SIGHUP, so the injection path now
asks the same source-of-truth as the response interceptor. A new
free function extractInjectableSecret(idx, name, secret) takes the
index explicitly so the QUIC path (which carries its own pointer)
shares the same logic with the addon.
Also wires the QUIC proxy with its own oauthIndex pointer plus a
SetOAuthIndex setter; Server.UpdateOAuthIndex now mirrors index
updates into both the addon and the QUIC proxy so HTTP/3 header
injection follows the same dispatch rules as HTTP/1+2.
Test coverage exercises:
- Static cred with plain string -> pass-through.
- Static cred whose value is OAuth-shaped JSON -> still
pass-through (because it's not in the OAuth index).
- OAuth cred with normal JSON -> access_token extraction.
- OAuth cred with leading whitespace -> still extracts
access_token (json.Unmarshal handles whitespace).
- OAuth cred with corrupted vault payload -> falls back to
raw secret with a diagnostic log line.
- nil index -> every credential treated as static (early-startup
path before UpdateOAuthIndex fires).
- proxy/addon: extractInjectableSecret log uses generic [INJECT] prefix because the helper now serves both the HTTP/1+2 addon path (internal/proxy/addon.go) and the HTTP/3 (QUIC) path (internal/proxy/quic.go). [ADDON-INJECT] would mislead a reader who saw the line in a deployment that uses HTTP/3 exclusively. - proxy/addon: StreamResponseModifier now returns http.NoBody (an empty reader) instead of nil when the input reader is nil. The nil case only fires under f.Stream=true (where the Response addon callback was skipped, so f.Response.Body is empty), so a literal nil return left reply() with no body source AND no body bytes — undefined-length responses trip some HTTP clients. http.NoBody is well-framed: reply() copies its EOF immediately and the agent sees a zero-byte body. The nil-input test was updated to assert the new behavior.
The deferred recover on StreamResponseModifier previously fell back to `out = in` on panic. After io.ReadAll consumes the input stream (line 901), `in` is drained — a panic later in swapOAuthTokens or the body-replacement steps would hand the agent an empty reader even though we already had the bytes in memory. Capture the bytes into a closure-scoped `bufferedBody` slice immediately after the read completes. The recover prefers bufferedBody if it is set, otherwise falls back to `in` (the read hasn't started yet, stream is intact). Either way the agent gets a usable body.
Two PR #38 round 6 follow-ups. Server now caches the latest credential_meta slice it saw via UpdateOAuthIndex. setupInjection re-applies the cache to the QUIC proxy right after assigning s.quicProxy, so an UpdateOAuthIndex call that arrives before the proxy exists is not silently dropped. Without the cache, a future reorder of the startup sequence would re-introduce the OAuth-envelope header injection bug for HTTP/3 traffic. Response addon also gets a nil-flow guard mirroring StreamResponseModifier so the deferred recover does not dereference a nil flow when tests build a minimal Response addon path.
Copilot round 7: the deferred recover in StreamResponseModifier returned the buffered raw upstream bytes when a panic fired after io.ReadAll. For an OAuth-matched token endpoint those bytes contain real access and refresh tokens, so a panic in swapOAuthTokens (or anywhere between the read and the swap) would have leaked real credentials to the agent. Replace bufferedBody with safeFallback. We assign it ONLY after a successful swap, when the buffer is the phantom-only modified body. A panic before that point falls back to http.NoBody. The agent sees an empty 2xx token response and surfaces the failure as a parse error, which is strictly preferable to handing over a real bearer. The non-panic error path (swap parse failure on a non-OAuth body, e.g. an HTML error page from a misconfigured token endpoint) keeps the historical pass-through behavior but no longer assigns safeFallback so a late panic would still hit http.NoBody rather than echo whatever that body was.
Copilot round 8: the nil-input guard ran AFTER the early return for nil flow / nil request, so a call where both `f` and `in` were nil would skip the guard, leave `out = in` (nil), and trip a nil io.Reader nil-deref in go-mitmproxy's downstream copy. Move the nil-input check to the top of the function so http.NoBody is returned regardless of the flow's shape.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production-deployment fixes that turned up while validating v0.13.1 against a live Hermes deployment running through sluice.
1. Chown env file back to the agent runtime user (the original motivation)
Sluice writes phantom tokens into the agent's env file via
docker execrunning the image's USER (root for the upstream openclaw and hermes images). The result was a root-owned env file, blocking the agent's own write paths — most visibly,hermes claw migratefailed withPermission deniedafter a sluice restart. The script nowstat | chowns the file back to the parent directory's owner, so a non-root agent can keep editing the file. Works on both GNU and BSD stat (Linux containers + tart macOS guests) via a-cthen-ffallback.2. Response handler panic recovery
The OpenAI Codex device-code OAuth flow surfaced a
runtime error: invalid memory address or nil pointer dereferencepanic from somewhere downstream of the OAuth response handler. go-mitmproxy's outer recover swallowed it, abandoned the response body, and the agent received empty bytes — Hermes failed withJSONDecodeErroreven though sluice had successfully phantom-swapped the tokens.ResponseandStreamResponseModifiernow have deferred recover guards that log the full stack trace and fall back to safe defaults. Real tokens cannot leak — the OAuth handler has its own snapshot/rollback that runs BEFORE the panic, sof.Response.Bodyis either fully phantom-swapped or untouched at the time of the recover.StreamResponseModifieralso gets a defensive nil-input guard plus a buffered-body capture so a panic afterio.ReadAlldrains the input still hands the agent the buffered bytes (rather than an empty reader from the now-drained stream).3. OAuth-vs-static dispatch for header injection (HTTP/1+2 and HTTP/3)
Reproduced live: when an OAuth-type credential has a header binding with a
{value}template, the previous code was substituting the entire JSON envelope into the header (Bearer {"access_token":"<jwt>","refresh_token":"...","token_url":"..."}), and upstream APIs reject that with401 "Could not parse your authentication token". This bug had been latent since the bidirectional OAuth phantom swap was introduced in v0.6.0; it only manifested for OAuth credentials with header-binding templates, and the typical pre-v0.6.0 setup used static credentials so it never surfaced.Fix: a new
extractInjectableSecrethelper consults theOAuthIndex(rebuilt fromcredential_metaon startup and SIGHUP) to dispatch by credential metadata type, not by JSON shape inference. Static credentials whose value happens to be OAuth-shaped JSON are NOT misclassified. OAuth credentials parse viavault.ParseOAuthand yield justaccess_token.Server.UpdateOAuthIndexnow mirrors the index into the QUIC proxy too so HTTP/3 header injection follows the same dispatch as HTTP/1+2; the index is built once during setup and applied to both the addon and the QUIC proxy after both are initialized so the startup ordering bug from the first iteration is gone.Test plan
go build ./...go test ./...— 2447 passed across 13 packages, including new focused tests for: chown step under both content branches, panic recovery in Response (test injection point), panic recovery in StreamResponseModifier (nil input + buffered body),extractInjectableSecretdispatching on metadata for static / OAuth-shaped-static / oauth-with-leading-whitespace / corrupted-payload / nil-index cases.gofumpt -lcleangolangci-lint run ./...cleanauth.openai.comandauth.anthropic.comreturns a phantom-swapped body without the JSONDecodeError orCould not parse your authentication tokenfailures that happened under v0.13.1.