refactor(redis): bound rangeList / readValueAt VerifyLeaderForKey to per-call ctx#752
refactor(redis): bound rangeList / readValueAt VerifyLeaderForKey to per-call ctx#752bootjp wants to merge 1 commit into
Conversation
…per-call ctx Follow-up to PR #749 (Redis adapter audit). Before: rangeList and readValueAt called VerifyLeaderForKey(r.handlerContext(), key). handlerContext is the long-lived server baseCtx with no deadline; a stalled ReadIndex would block the call until the verifyLeaderEngineCtx defense-in-depth tripped its 5 s default deadline. After: both functions accept a ctx parameter and forward it to VerifyLeaderForKey, same shape as the keys() and FLUSHDB fixes in PR #749 r1. Callers wrap r.handlerContext() with redisDispatchTimeout (LRANGE) or thread the EXEC dispatch ctx through txnContext (load → readValueAt inside a transaction). txnContext gains a ctx field carrying the EXEC-scoped dispatch ctx so reads invoked inside the transaction respect the caller's deadline. The struct holds it across the EXEC body (documented in-place with the //nolint:containedctx rationale — EXEC is a value-typed wrapper around a single in-flight client command, ctx must travel with it). This is a tightening, not a correctness fix: the verifyLeaderEngineCtx default-deadline guard already bounded these calls at 5 s. The change brings them under the same per-command dispatch budget as the rest of the Redis adapter so a slow leader probe can't blow past the user-visible command timeout the way a 5 s fallback could. Test: go test -race -count=1 -short ./adapter -- 531s, all green.
📝 WalkthroughWalkthroughThis PR threads per-request dispatch contexts through transaction execution and command handlers so that reads and leader verification operations honor the caller's ChangesContext Threading for Deadline-Aware Operations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements context propagation for Redis command execution, specifically updating rangeList and readValueAt to honor caller deadlines. The review feedback recommends consistently using the provided context for all internal operations within these methods, such as store.GetAt and keyTypeAt, rather than falling back to context.Background(). Additionally, the feedback emphasizes acquiring MVCC timestamps after the linearizable read fence to maintain consistency.
| } | ||
|
|
||
| func (r *RedisServer) rangeList(key []byte, startRaw, endRaw []byte) ([]string, error) { | ||
| func (r *RedisServer) rangeList(ctx context.Context, key []byte, startRaw, endRaw []byte) ([]string, error) { |
There was a problem hiding this comment.
While ctx is now passed to rangeList, it is only used for the VerifyLeaderForKey call. Other context-aware operations in this method, such as keyTypeAt (line 3260), resolveListMeta (line 3279), and fetchListRange (line 3292), still use context.Background(). To fully adhere to the goal of bounding the command execution and ensuring prompt cancellation, these calls should also use the provided ctx. Additionally, per the rules for linearizable reads in MVCC, ensure the timestamp for the read is acquired after the linearizable read fence (VerifyLeaderForKey) to guarantee a consistent snapshot and prevent stale data reads.
References
- When designing interfaces, use context.Context for managing deadlines and cancellation instead of separate timeout parameters.
- When implementing linearizable reads in an MVCC system, ensure that the timestamp used for the MVCC read is acquired after the linearizable read fence has completed to guarantee a consistent snapshot view, especially for rangeList.
| } | ||
|
|
||
| func (r *RedisServer) readValueAt(key []byte, readTS uint64) ([]byte, error) { | ||
| func (r *RedisServer) readValueAt(ctx context.Context, key []byte, readTS uint64) ([]byte, error) { |
There was a problem hiding this comment.
Similar to rangeList, readValueAt now receives a ctx but continues to use context.Background() for hasExpired (line 3522) and store.GetAt (line 3538). These should be updated to use the passed ctx to ensure the entire read path respects the dispatch deadline. Furthermore, ensure that the timestamp used for the MVCC read is acquired after the linearizable read fence has completed to guarantee a consistent snapshot view and prevent stale data reads, as required for readValueAt.
References
- When designing interfaces, use context.Context for managing deadlines and cancellation instead of separate timeout parameters.
- When implementing linearizable reads in an MVCC system, ensure that the timestamp used for the MVCC read is acquired after the linearizable read fence has completed to guarantee a consistent snapshot view, especially for readValueAt.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@adapter/redis.go`:
- Around line 1764-1770: The load() method should defensively handle a nil
txnContext.ctx to avoid panics in tests that construct txnContext literals
without ctx; inside load() (the caller used by stageKeyDeletion() which loops
and calls readValueAt()), check if t.ctx is nil and substitute
context.Background() (or another appropriate base context) before passing it
into readValueAt(); this mirrors the pattern used by
streamDeletions/loadListState and ensures mocks that call context.WithTimeout
won't panic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 764ffb47-a72f-456a-a177-f2a989b81f30
📒 Files selected for processing (2)
adapter/redis.goadapter/redis_retry_test.go
| server *RedisServer | ||
| // ctx is the per-EXEC dispatch context (redisDispatchTimeout-bounded | ||
| // at the call site in runTransaction). Plumbed through so reads | ||
| // inside the EXEC such as load() → readValueAt() respect the | ||
| // caller's deadline rather than falling back to handlerContext + | ||
| // the verifyLeaderEngineCtx safety net. | ||
| ctx context.Context //nolint:containedctx // EXEC is a long-lived value type that wraps a single client command, ctx must travel with it. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find every txnContext composite literal in the repo and dump 12 lines after
# so we can see which fields are initialised.
rg -nPU --type=go -A 12 '(?s)&?txnContext\s*\{' adapter
echo '---'
# Cross-check call sites of stageKeyDeletion and load(internalKey) reachable
# from those fixtures.
rg -n --type=go '\bstageKeyDeletion\s*\(|\.load\s*\(' adapterRepository: bootjp/elastickv
Length of output: 4263
🏁 Script executed:
# Check if loadListState calls into stageKeyDeletion or load
rg -n --type=go -A 15 'func \(t \*txnContext\) loadListState' adapter/redis.go | head -30
# Also check stageKeyDeletion to see what it does
rg -n --type=go -A 20 'func \(t \*txnContext\) stageKeyDeletion' adapter/redis.go | head -40Repository: bootjp/elastickv
Length of output: 1429
🏁 Script executed:
rg -n --type=go -A 50 'func \(t \*txnContext\) stageKeyDeletion' adapter/redis.go | head -60Repository: bootjp/elastickv
Length of output: 1953
🏁 Script executed:
rg -n --type=go -A 8 'func \(t \*txnContext\) load\(key \[\]byte\)' adapter/redis.go | head -20Repository: bootjp/elastickv
Length of output: 477
🏁 Script executed:
rg -n --type=go -A 25 'func \(t \*txnContext\) load\(key \[\]byte\)' adapter/redis.go | grep -A 25 "1847:"Repository: bootjp/elastickv
Length of output: 1009
🏁 Script executed:
rg -n --type=go -A 40 'func \(t \*txnContext\) load\(key \[\]byte\)' adapter/redis.go | tail -20Repository: bootjp/elastickv
Length of output: 639
🏁 Script executed:
rg -n --type=go 'func.*readValueAt' adapter/redis.goRepository: bootjp/elastickv
Length of output: 165
🏁 Script executed:
sed -n '1877,1883p' adapter/redis.go | cat -nRepository: bootjp/elastickv
Length of output: 302
Add a nil-safe fallback for t.ctx in load() to support test fixtures that omit the field.
Test fixtures in redis_txn_test.go (lines 37, 73, 108, 146) construct txnContext literals without ctx. When stageKeyDeletion() is called—for example, during key deletion in a transaction—it invokes t.load(internalKey) in a loop (lines 2380–2392), which passes t.ctx to readValueAt() at line 1879. If ctx is nil, any coordinator implementation or mock that uses context.WithTimeout(ctx, …) would panic.
Currently, test fixtures only call loadListState(), which uses context.Background() internally, so the issue is latent. However, this matches the existing pattern used for streamDeletions (line 2399): "test fixtures that build a minimal txnContext literal without this field still work." A one-line defensive fallback in load() preserves that design principle and adds robustness:
Suggested fix
} else {
var err error
- val, err = t.server.readValueAt(t.ctx, storageKey, t.startTS)
+ ctx := t.ctx
+ if ctx == nil {
+ ctx = context.Background()
+ }
+ val, err = t.server.readValueAt(ctx, storageKey, t.startTS)
if err != nil && !errors.Is(err, store.ErrKeyNotFound) {
return nil, errors.WithStack(err)
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@adapter/redis.go` around lines 1764 - 1770, The load() method should
defensively handle a nil txnContext.ctx to avoid panics in tests that construct
txnContext literals without ctx; inside load() (the caller used by
stageKeyDeletion() which loops and calls readValueAt()), check if t.ctx is nil
and substitute context.Background() (or another appropriate base context) before
passing it into readValueAt(); this mirrors the pattern used by
streamDeletions/loadListState and ensures mocks that call context.WithTimeout
won't panic.
Summary
rangeListandreadValueAtnow accept actx context.Contextparameter and forward it toVerifyLeaderForKeyinstead of using the long-livedr.handlerContext().lrangewrapsr.handlerContext()withredisDispatchTimeoutand passes it down. Inside MULTI/EXEC,txnContextcarries the EXEC-scoped dispatch ctx (set at construction inrunTransaction) soload() → readValueAt()respects the caller's deadline.verifyLeaderEngineCtxdefault-deadline guard introduced in PR refactor(kv): plumb caller context through write + verify-leader paths #749 r1 already bounded these paths at 5 s. This brings them under the same per-command dispatch budget as the rest of the adapter so a slow leader probe can't outlive the user-visible command timeout via the 5 s fallback.Background
PR #749 review (gemini/Codex) flagged that
keys()andFLUSHDBhad escaped the dispatch deadline by feedingr.handlerContext()(server baseCtx, no deadline) intoVerifyLeader*. Those were fixed inline. Two remaining call sites —rangeList(LRANGE leader path) andreadValueAt(single-key reads from EXEC bodies and string GETs) — were called out as a non-blocking follow-up because the newverifyLeaderEngineCtx5 s default already protected against the unbounded-wait regression. This PR tightens them to match.Why
txnContext.ctxtxnContextis the value type that wraps one in-flight EXEC body.load()is called repeatedly during EXEC and may transitively callreadValueAt(). Plumbing ctx through each method would mean threading it through everyworking[…]access — the struct is exactly the right scope to carry the EXEC dispatch ctx, so it gets actxfield with a//nolint:containedctxannotation explaining the rationale.Test plan
go test -race -count=1 -short ./adapter— 531s, all greengo vet ./...— cleanSummary by CodeRabbit