feat(encryption): FSM-internal Raft entry types (Stage 4) by bootjp · Pull Request #748 · bootjp/elastickv

bootjp · 2026-05-09T20:17:21Z

Summary

Stage 4 of the data-at-rest encryption rollout (design doc: docs/design/2026_04_29_proposed_data_at_rest_encryption.md, §4.1 writer registry, §5.2 rotation, §5.6 bootstrap, §6.3 fail-closed apply, §11.3 reserved opcodes). Adds the byte-tag dispatch and HaltApply propagation seam for three new FSM-internal Raft entry types:

0x03 OpRegistration — writer-registry insert / re-register
0x04 OpBootstrap — initial DEK install + active pointers + batch writer-registry rows
0x05 OpRotation — DEK rotation + active-pointer update + proposer registration

Stage 4 is inert in production: with the new EncryptionApplier option not wired (NewKvFSMWithHLC default), every existing FSM behaviour is unchanged byte-for-byte. The encryption opcodes only land once Stage 5/6/7 plug in concrete handlers.

Out of scope (deferred)

Stage 5: EncryptionAdmin gRPC + CLI (proposes 0x03/0x04/0x05 entries)
Stage 6: 3-phase rollout cluster flags (enable-storage-envelope, enable-raft-envelope rotation sub-tags)
Stage 7: Writer registry persistence + deterministic-nonce factory (consumes internal/encryption/registry.go)
Stage 8: Snapshot header v2 (round-trip encrypted entries through MVCC stream)
Stage 9: KMS-backed wrappers, compression, retire/rewrite, full Jepsen

Test plan

go test -race ./internal/encryption/... — codec round-trip, byte-layout pin, malformed/version/sub-tag rejection
go test -race ./internal/raftengine/etcd/... — HaltApply seam halts apply loop without advancing setApplied
go test -race ./kv/... — per-opcode dispatch, no-applier fail-closed, handler-error wrapping, decode-failure halt, legacy opcodes unaffected
golangci-lint run clean (0 issues)
Wider regression: go test -race ./internal/... ./store/... ./kv/... — every existing test passes byte-for-byte; no FSM caller wires WithEncryption in Stage 4
CI verifies build / golangci-lint / test / Jepsen

Self-review (CLAUDE.md 5 passes)

Data loss — encryption opcodes that fail to apply (malformed payload or handler error) halt without advancing setApplied; next restart replays. Aligned with Stage 3's ErrRaftUnwrapFailed halt invariant.
Concurrency — f.encryption is set once at FSM construction and read-only thereafter; same single-writer discipline as f.hlc. The HaltApply interface check on the apply hot path is a single type assertion, lock-free.
Performance — cipher-disabled (no applier) and non-encryption opcode paths are pre-existing; the new opcode tags add a switch case + one type assertion per applied entry. Negligible.
Consistency — byte-tag dispatch is mutually exclusive; legacy opcodes (0x00/0x01/0x02) remain on their original code paths verbatim. Wire format fails closed on any version / sub-tag / length mismatch.
Test coverage — 35+ unit tests across four packages plus the existing wider suite continuing to pass with no changes.

Summary by CodeRabbit

New Features
- Added support for encryption operations including registration, bootstrap, and key rotation with robust error handling.
Bug Fixes
- Implemented fail-closed error handling for encryption apply operations to prevent data advancement on encryption failures.
Tests
- Added comprehensive test coverage for encryption wire format, registry operations, and apply loop halt behavior.

Stage 4 of the data-at-rest encryption rollout (design doc §4.1 writer registry, §5.2 rotation, §5.6 bootstrap, §6.3 fail-closed apply, §11.3 reserved opcodes). Adds the byte-tag dispatch and HaltApply seam for the three new FSM-internal Raft entry types: 0x03 OpRegistration — writer-registry insert / re-register 0x04 OpBootstrap — initial DEK install + active pointers + batch writer-registry rows 0x05 OpRotation — DEK rotation + active-pointer update + proposer registration Stage 4 ships the WIRE FORMAT, the DISPATCH SEAM, and the HaltApply propagation contract. Stage 5/6/7 plug in concrete EncryptionApplier implementations that mutate the keystore + sidecar + writer-registry on apply. The Stage 4 default (no EncryptionApplier wired) is fail-closed: any encryption opcode arriving at the FSM halts the apply loop with ErrEncryptionApply rather than silently advancing setApplied. Wire format (internal/encryption/fsmwire). Hand-rolled binary with leading version byte, NOT proto3: - State machine determinism requires byte-for-byte reproducibility; proto3's lenient unknown-field handling would let a future leader-built field silently drop on a stale follower. - Schema is small (~30 lines per opcode); maintenance cost is negligible. - Matches in-house style (envelope.go, raft_envelope.go, raft_payload_wrapper.go all hand-roll binary). Writer registry codec (internal/encryption/registry.go). Reserves the `!encryption|writers|<be4 dek_id>|<be2 uint16(node_id)>` Pebble key prefix. Stage 4 ships the codec only; Stage 7's deterministic nonce factory consumes registry rows for cross-replica nonce uniqueness. ErrEncryptionApply sentinel moved to internal/encryption (was proposed for internal/raftengine/etcd, but kv → engine import cycle blocked it: engine_test imports kv as a fake FSM). internal/raftengine/etcd re-exports the sentinel as ErrEncryptionApply for in-package errors.Is matching. HaltApply seam in internal/raftengine/etcd: applyNormalCommitted now type-asserts the FSM Apply return for `interface{ HaltApply() error }`. Non-nil HaltApply.err propagates as a halt — same fail-closed shape as the Stage 3 ErrRaftUnwrapFailed path. setApplied is NOT called; runLoop's fatal-error path takes the process down. The StateMachine.Apply contract (`Apply(data) any`) is unchanged — no ripple into hashicorp engine, no test churn. kv/fsm_encryption.go dispatches the 0x03/0x04/0x05 opcodes to the configured EncryptionApplier (or fails closed when nil), wraps any error with ErrEncryptionApply, and packs it in haltApplyResponse so the engine's HaltApply seam recognises it. Constructor: NewKvFSMWithHLC gains a variadic FSMOption argument. WithEncryption(applier) is the only option Stage 4 introduces; existing 30+ NewKvFSMWithHLC callsites compile unchanged. Test coverage: - internal/encryption/registry_test.go: round-trip + byte-layout pin + malformed rejection for keys and values. - internal/encryption/fsmwire/wire_test.go: round-trip per opcode + byte-layout pin + version-byte / sub-tag rejection + truncated-input rejection across every field boundary. - internal/raftengine/etcd/encryption_test.go: HaltApply seam halts apply loop without advancing setApplied; nil HaltApply advances normally. - kv/fsm_encryption_test.go: per-opcode happy path; no-applier fail-closed on each opcode; applier-side error wraps ErrEncryptionApply; truncated decode wraps both ErrFSMWireMalformed and ErrEncryptionApply; legacy 0x00/0x01/ 0x02 opcodes do NOT route through the encryption dispatcher. Self-review (CLAUDE.md 5 passes): 1. Data loss: encryption opcodes that fail to apply (malformed payload, applier error) halt without advancing setApplied; next restart replays. Aligned with Stage 3 ErrRaftUnwrapFailed halt invariant. 2. Concurrency: f.encryption is set once at FSM construction and read-only thereafter; same single-writer discipline as f.hlc. The HaltApply interface check on the apply hot path is one type assertion, lock-free. 3. Performance: cipher-disabled (no applier wired) and non- encryption opcode paths are pre-existing — only the new opcode tags add a switch case. Hot path overhead = 1 type assertion per applied entry. 4. Consistency: byte-tag dispatch is mutually exclusive (each opcode has its own case); legacy opcodes (0x00/0x01/0x02) remain in their original code paths verbatim. Wire format fails closed on any version / sub-tag / length mismatch. 5. Test coverage: 35+ unit tests across four packages plus the existing wider test suite which continues to pass with no changes (no FSM caller wires WithEncryption in Stage 4).

coderabbitai · 2026-05-09T20:17:28Z

📝 Walkthrough

Walkthrough

This PR implements Stage-4 encryption opcode handling in the Raft FSM with a fail-closed halt mechanism. It introduces a versioned binary wire format for three encryption payloads (registration, bootstrap, rotation), a registry codec for persistent state, a halt-apply seam in the engine's apply loop, and encryption handlers integrated into the FSM via configurable options.

Changes

Encryption FSM Opcode Handling

Layer / File(s)	Summary
Error Contract `internal/encryption/errors.go`	Introduces `ErrEncryptionApply` sentinel for signaling fatal encryption failures to the raft apply loop.
Wire Format Constants & Codec `internal/encryption/fsmwire/wire.go`	Defines opcodes (0x03–0x05), wire version, payload types (Registration, Bootstrap, Rotation with batch registry support), and encoding/decoding functions with DoS protections (batch count cap before allocation, uint64 overflow checks).
Wire Format Tests `internal/encryption/fsmwire/wire_test.go`	Covers round-trip encode/decode, exact byte layout pinning, malformed/truncated/trailing-byte rejection, batch-count overflow protection, and opcode distinctness.
Registry Codec `internal/encryption/registry.go`	Implements Pebble key/value codec for writer-registry rows: `RegistryKey(dekID, nodeID16)` and `RegistryValue(fullNodeID, firstSeenEpoch, lastSeenEpoch)` with fixed-size big-endian layouts and sentinel errors.
Registry Tests `internal/encryption/registry_test.go`	Validates round-trip encode/decode, byte layout pinning, prefix correctness, and rejection of malformed keys/values.
Engine Halt-Apply Integration `internal/raftengine/etcd/encryption.go`, `internal/raftengine/etcd/engine.go`	Engine's `applyNormalCommitted` now inspects FSM apply responses for `HaltApply() error` interface; non-nil errors halt without advancing applied index. Re-exports `ErrEncryptionApply` to avoid import cycles.
Engine Halt Tests `internal/raftengine/etcd/encryption_test.go`	Validates halt prevents applied/appliedIndex advancement while FSM.Apply is invoked; nil HaltApply allows normal progression.
FSM Encryption Routing `kv/fsm.go`	Adds `FSMOption` pattern and `WithEncryption` to configure encryption applier. New `applyReservedOpcode` dispatcher handles HLC and encryption opcode ranges before legacy raft-request path.
FSM Encryption Handlers `kv/fsm_encryption.go`	Defines `EncryptionApplier` interface; implements `applyEncryption`, `applyRegistration`, `applyBootstrap`, `applyRotation` to decode fsmwire payloads, dispatch to applier, and halt on error by wrapping with `ErrEncryptionApply`.
FSM Tests `kv/fsm_encryption_test.go`	Tests happy-path dispatch, applier error wrapping, decode failures, no-applier fail-closed behavior, future/reserved opcode halting, legacy opcode bypass, and exhaustive range validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

bootjp/elastickv#719: Modifies internal/encryption package and introduces encryption primitives used by this FSM/raft integration PR.
bootjp/elastickv#707: Implements the data-at-rest encryption design containing the wire formats, registry, FSM opcodes, and halt semantics realized in this PR.

Poem

🐰 Hop-hop, the encryption opcodes now dance!
Wire frames spin in encrypted trance,
Registry tracks each writer's stance,
Halt seams halt when errors entrance—
Fail-closed by default, not by chance! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately summarizes the main change: implementation of FSM-internal Raft entry types for Stage 4 of the encryption rollout, with opcodes 0x03-0x05 (registration, bootstrap, rotation).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/encryption-fsm-entries

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements the Stage 4 encryption infrastructure, introducing a hand-rolled binary wire format for FSM-internal encryption opcodes and a writer registry layout for Pebble storage. It also establishes a "HaltApply" seam in the Raft engine, enabling the state machine to signal a fatal fail-closed halt during application. Review feedback highlights critical improvements for the binary decoder, including preventing potential OOM attacks by validating remaining buffer size before slice allocation, fixing a 32-bit compilation error in integer bounds checking, and correcting a potential truncation issue in buffer size comparisons on 64-bit systems.

gemini-code-assist · 2026-05-09T20:20:01Z

+	if !ok {
+		return nil, errors.Wrap(ErrFSMWireMalformed, "bootstrap: batch count truncated")
+	}
+	regs := make([]RegistrationPayload, count)


This allocation is vulnerable to a Denial of Service (OOM) attack. A malicious leader or a malformed Raft entry could specify a massive count (up to 2^32-1) while providing a tiny payload. Since RegistrationPayload is 14 bytes, a large count could trigger a multi-gigabyte allocation before the loop even attempts to read the data. You should verify that the remaining buffer size is sufficient to satisfy the requested count before allocating the slice.

if uint64(r.remaining()) < uint64(count)*uint64(registrationSize) { return nil, errors.Wrap(ErrFSMWireMalformed, "bootstrap: batch count exceeds remaining bytes") } regs := make([]RegistrationPayload, count)

References

When deserializing data with a size header, pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation and the entire data must be materialized in memory. Avoid adding complexity with chunked reads if it doesn't meaningfully change the peak memory bound.

gemini-code-assist · 2026-05-09T20:20:01Z

+	const maxU32 = int(^uint32(0))
+	if n < 0 || n > maxU32 {
+		panic(errors.Newf("fsmwire: length %d does not fit in uint32", n))
+	}
+	return uint32(n) //nolint:gosec // bounds checked above


The maxU32 constant definition will cause a compilation error on 32-bit platforms because ^uint32(0) (which is 4,294,967,295) overflows the signed int type (max 2,147,483,647). Even if it compiled, the logic n > maxU32 would be broken because maxU32 would be interpreted as -1 due to two's complement representation, causing the function to panic for all non-negative inputs. A more portable approach is to use an explicit uint64 check or math.MaxUint32.

Suggested change

const maxU32 = int(^uint32(0))

if n < 0 || n > maxU32 {

panic(errors.Newf("fsmwire: length %d does not fit in uint32", n))

}

return uint32(n) //nolint:gosec // bounds checked above

if n < 0 || uint64(n) > 0xffffffff {

panic(errors.Newf("fsmwire: length %d does not fit in uint32", n))

}

return uint32(n) //nolint:gosec // bounds checked above

gemini-code-assist · 2026-05-09T20:20:02Z

+	// past len(src)) into uint32 with a saturation guard so a
+	// 32-bit platform's int can never overflow the comparison.
+	rem := r.remaining()
+	if rem < 0 || uint32(rem) < n { //nolint:gosec // rem >= 0 enforced by the rem < 0 branch


The comparison uint32(rem) < n is incorrect on 64-bit platforms when the remaining buffer size exceeds 2^32-1 bytes. In such cases, uint32(rem) will truncate the value, potentially causing the check to falsely fail (e.g., if rem is 4.3 GiB, uint32(rem) might be small, making it appear as if there is insufficient space for n). While Raft entries are typically much smaller, this codec should be robust for all valid slice sizes. Using uint64 for the comparison ensures correctness across platforms.

Suggested change

if rem < 0 || uint32(rem) < n { //nolint:gosec // rem >= 0 enforced by the rem < 0 branch

if rem < 0 || uint64(rem) < uint64(n) { //nolint:gosec // rem >= 0 enforced by the rem < 0 branch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8213d9932

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T20:23:44Z

+	if !ok {
+		return nil, errors.Wrap(ErrFSMWireMalformed, "bootstrap: batch count truncated")
+	}
+	regs := make([]RegistrationPayload, count)


Validate bootstrap batch count before allocating

readRegistrationBatch trusts the decoded count and immediately does make([]RegistrationPayload, count), so a malformed bootstrap payload can request billions of rows and trigger an out-of-memory panic before any ErrFSMWireMalformed is returned. I reproduced this with a minimal payload whose batch count is 0xffffffff, which crashes in makeslice at this line. Because Raft entries are external input to the decoder, this bypasses the intended fail-closed error path and turns decode errors into process-killing memory exhaustion.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T20:23:44Z

+// hundred-byte / hundred-entry scale; a uint32 overflow indicates a
+// caller bug, not adversarial input, so panic is appropriate.
+func safeU32(n int) uint32 {
+	const maxU32 = int(^uint32(0))


Remove 32-bit overflow in safeU32 max constant

The constant int(^uint32(0)) overflows int on 32-bit targets, so this package does not compile for GOARCH=386 (build error: constant 4294967295 overflows int). That breaks cross-compilation and contradicts the surrounding comments that discuss 32-bit behavior. Use a width-agnostic bound check (for example compare via uint64/math.MaxUint32) instead of converting max uint32 to int at compile time.

Useful? React with 👍 / 👎.

…mini) Three findings on the Stage 4 fsmwire codec, all fixed in this commit: 1. P1 / security-high (codex + gemini): readRegistrationBatch trusted the decoded uint32 count and immediately did `make([]RegistrationPayload, count)`. A malformed bootstrap payload that names count=0xFFFFFFFF would allocate 4G * 14 = ~56 GiB before the loop attempted the first row read — the apply path OOMs instead of returning ErrFSMWireMalformed. Add a bounds check `count * registrationSize <= remaining` computed in uint64 (overflow-safe) before the make. 2. P2 / high (codex + gemini): safeU32 used `int(^uint32(0))` as the bound. The constant 4,294,967,295 overflows int on GOARCH=386, so cross-compilation to 32-bit fails before any runtime check fires. Replace with a width-agnostic `uint64(n) > math.MaxUint32` test. 3. medium (gemini): readLenU32Bytes compared `uint32(rem) < n`, which on a 64-bit reader with rem > 2^32 silently truncates rem to a small uint32 and falsely reports "insufficient space". Compare in uint64. New regression test TestBootstrap_RejectsHugeBatchCount rewrites the on-the-wire batch-count prefix to 0xFFFFFFFF and asserts the decoder returns ErrFSMWireMalformed instead of OOM-killing the process. Caller audit (semantic change: previously-OOM-on-malformed path becomes an error return): - readRegistrationBatch is called only by DecodeBootstrap, which propagates the error to its sole caller in kv/fsm_encryption.go. The kvFSM dispatcher already wraps with ErrEncryptionApply and halts the apply loop — no change to the existing fail-closed contract. - safeU32 is called only by appendLenU32Bytes (encoder path, leader-only); panic is preserved as before for the >uint32-max path that today is unreachable in practice. - readLenU32Bytes is called by both decoders (bootstrap + rotation) and by the encoder via appendLenU32Bytes. The new uint64 comparison is strictly more accurate; tests over the existing happy / truncated / version paths continue to pass. Cross-compilation gate: `GOOS=linux GOARCH=386 go build ./internal/ encryption/fsmwire/...` is now clean.

bootjp · 2026-05-10T14:20:41Z

@claude review

PR748 round-1: codex P1 + gemini security-high/high/medium 計 3 件を反映しました (commit 0f8d46c)。

P1 / security-high (codex + gemini): readRegistrationBatch の OOM DoS。make([]RegistrationPayload, count) が count=0xFFFFFFFF で 56 GiB alloc → process OOM。count * registrationSize <= remaining を uint64 で bound check してから make。

P2 / high (codex + gemini): safeU32 の int(^uint32(0)) は GOARCH=386 で compile-error。uint64(n) > math.MaxUint32 に置換、GOOS=linux GOARCH=386 go build 通過確認済。

medium (gemini): readLenU32Bytes の uint32(rem) < n は 64-bit reader で rem > 2^32 のとき truncate。uint64 比較に変更。

caller audit (意味論変更: previously-OOM path → error return):

readRegistrationBatch の唯一の caller は DecodeBootstrap → kv/fsm_encryption.go applyBootstrap で ErrEncryptionApply wrap、HaltApply 経路で halt。fail-closed 契約は変更なし。
safeU32 は encoder の appendLenU32Bytes のみ (leader-only)、panic 動作は維持。
readLenU32Bytes は両 decoder + encoder。uint64 比較で精度向上、既存 happy/truncated/version test 全 pass。

新規テスト TestBootstrap_RejectsHugeBatchCount: on-wire の batch-count prefix を 0xFFFFFFFF に書き換え → ErrFSMWireMalformed 返却 (pre-fix なら OOM)。

make test/lint green、386 cross-build clean。

claude · 2026-05-10T14:20:59Z

Claude finished @bootjp's task in 4m 12s —— View job

Round 2 Review

Todo

Load task context
Read wire.go — verify OOM fix, safeU32 fix, readLenU32Bytes fix
Read wire_test.go — verify new TestBootstrap_RejectsHugeBatchCount test
Read kv/fsm_encryption.go and fsm.go
Read registry.go, raftengine/etcd/encryption.go, engine.go (HaltApply seam)
Post comprehensive review

Fix Verification

P1 / security-high — OOM DoS in `readRegistrationBatch` ✅

wire.go:322-350 — fix is correct. The bound check:

need := uint64(count) * uint64(registrationSize)
if uint64(rem) < need {
    return nil, errors.Wrapf(...)
}
regs := make([]RegistrationPayload, count)

The multiplication uint64(count) * uint64(registrationSize) cannot overflow uint64: max value is (2^32-1) × 14 ≈ 6×10¹⁰ ≪ 2^64. Both sides of the comparison are uint64 so no truncation. The rem < 0 guard is dead code (the reader invariant guarantees off <= len(src)), but it's a safe defensive belt-and-suspenders check and consistent with readLenU32Bytes.

TestBootstrap_RejectsHugeBatchCount (wire_test.go:148–174) correctly hand-crafts a bootstrap with count=0xFFFFFFFF and verifies ErrFSMWireMalformed is returned rather than OOM. The offset arithmetic (const countOffset = 1 + 4 + 4 + 0 + 4 + 4 + 0 = 17) is correct for the empty-wrapped-DEK case used.

P2 / high — `safeU32` compile error on GOARCH=386 ✅

wire.go:558-562 — fix is correct:

func safeU32(n int) uint32 {
    if n < 0 || uint64(n) > math.MaxUint32 { //nolint:gosec
        panic(...)
    }
    return uint32(n) //nolint:gosec // bounds checked above
}

math.MaxUint32 is the constant 1<<32 - 1 (untyped, valid on all platforms). The || short-circuits so uint64(n) is only evaluated when n >= 0, making the conversion safe. The nolint directive comment accurately describes the reasoning. The previous int(^uint32(0)) would produce a compile-time overflow on 32-bit targets.

medium — `readLenU32Bytes` uint32 truncation ✅

wire.go:502-521 — fix is correct:

rem := r.remaining()
if rem < 0 || uint64(rem) < uint64(n) { //nolint:gosec
    return nil, false
}
end := r.off + int(n) //nolint:gosec // n bounded by rem above; rem fits int

The comment justifying int(n): after the check, uint64(n) <= uint64(rem) and rem is a non-negative int, so n <= rem <= math.MaxInt, meaning int(n) is safe on every platform Go supports. The subsequent r.off + int(n) <= len(r.src) follows from int(n) <= rem = len(r.src) - r.off. All arithmetic is sound.

Remaining Observations (non-blocking)

1. DecodeRegistryKey uses a manual prefix loop (registry.go:89-93) instead of bytes.HasPrefix or a slice comparison. Functionally identical but slightly less idiomatic. Not a bug.

2. TestApply_LegacyOpcodesUnaffected only covers 0x02 (fsm_encryption_test.go:232-246). Testing 0x00/0x01 would require a valid proto payload and a wired store, so excluding them is the right call given newFSMWithFake doesn't wire a store. The test correctly verifies the encryption applier receives no calls on an HLC entry, which is the dispatch property under test.

3. fakeApplier.lastReg/lastBootstrap/lastRotation are not atomic (fsm_encryption_test.go:26-28). Each test creates its own fakeApplier and calls Apply synchronously before reading these fields, so there is no actual race. But if a future test runs Apply concurrently it would need explicit synchronization. Worth noting for Stage 5+ when real concurrent dispatch tests land.

4. HaltApply success-path nil contract (fsm_encryption.go:92-94): The success branch return nil emits an untyped nil (not a typed *haltApplyResponse(nil)), so the engine's type assertion response.(interface{ HaltApply() error }) correctly returns ok=false. The haltErr constructor is always called with non-nil errors in current code. The comment in haltErr documents the invariant. ✅

5. resolveProposal is not called on halt (engine.go:2066-2067): When applyNormalCommitted returns early on a HaltApply error, the proposer's pending-proposal entry is never resolved and will time out. This is the correct behavior for a §6.3 fatal-apply — the operator must investigate rather than the proposer retrying silently.

Verdict

All three Round 1 findings are fixed correctly. The uint64 bound checks, the portable safeU32, and the uint64 truncation fix all verify cleanly against the code. The new TestBootstrap_RejectsHugeBatchCount regression test directly exercises the pre-fix OOM path. The HaltApply seam (applyNormalCommitted → HaltApply() interface check → error propagation without setApplied) is correctly implemented and tested from both the engine side (TestApplyCommitted_HaltApply_DoesNotAdvanceApplied) and the FSM side (TestApply_NoApplierWired, TestApply_HandlerError_HaltsWithEncryptionApply, TestApply_DecodeFailure_Halts). No further changes required for Round 2.

LGTM — Stage 4 dispatch seam and security fixes are correct.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f8d46cf9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-10T14:26:11Z

+		case fsmwire.OpRegistration, fsmwire.OpBootstrap, fsmwire.OpRotation:
+			return f.applyEncryption(data[0], data[1:])
+		}


Route unknown encryption opcodes through halt path

The new opcode gate only dispatches 0x03/0x04/0x05 to applyEncryption, so any future encryption opcode (for example 0x06) falls through to legacy protobuf decoding instead of the fail-closed halt path. In this codebase, legacy decode failures are returned as ordinary FSM responses, and applyNormalCommitted still advances setApplied unless the response implements HaltApply, so a mixed-version follower can acknowledge and skip a committed encryption entry rather than stopping for replay. That creates a real divergence risk during rolling upgrades where newer leaders emit additional internal opcodes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-10T14:26:11Z

+	purpose, ok := r.readByte()
+	if !ok {
+		return 0, 0, nil, errors.Wrap(ErrFSMWireMalformed, "rotation: purpose truncated")
+	}
+	wrapped, ok := r.readLenU32Bytes()
+	if !ok {
+		return 0, 0, nil, errors.Wrap(ErrFSMWireMalformed, "rotation: wrapped truncated")
+	}
+	return dekID, Purpose(purpose), wrapped, nil


Reject unknown rotation purpose values during decode

The rotation decoder validates version and sub-tag but accepts any raw purpose byte and casts it directly to Purpose, so malformed entries like purpose=0xFF are treated as syntactically valid and passed to the applier. Because this wire format is intended to fail closed on malformed state-machine input, allowing out-of-enum purpose values weakens that invariant and can let invalid committed payloads advance setApplied if an applier does not independently re-validate purpose.

Useful? React with 👍 / 👎.

…e (PR748 codex P1+P2) Two PR748 review findings, both with caller audits. P1 (codex, orange): The Apply dispatcher only routed the three implemented opcodes (0x03 / 0x04 / 0x05) to the encryption fail- closed halt path, so a future encryption opcode (e.g. 0x06) emitted by a newer leader against a stale follower would fall through to decodeLegacyRaftRequest's proto3 unmarshal, fail with a plain `error`, and — because plain errors do NOT implement the HaltApply interface — let applyNormalCommitted advance setApplied past the entry. That is the exact silent-skip-on-malformed behaviour §6.3 added the integrity tag and HaltApply seam to detect. The fix routes the closed range [OpEncryptionMin, OpEncryptionMax] through applyEncryption. applyEncryption's default case already wraps unknown opcodes with ErrEncryptionApply (haltApplyResponse), so adding a new opcode without registering its dispatch case still halts cleanly. Range bound 0x07 (NOT 0x0F): proto3 wire tags for field 1 occupy 0x08..0x0D (field 1 with wire types varint / fixed64 / length- delim / start-group / end-group / fixed32). Routing those through the encryption dispatcher would short-circuit the legacy proto3 fallback in decodeLegacyRaftRequest for any RaftCommand/Request payload whose first encoded field is field 1 — for example `Request.is_txn = true` marshals as `0x08 0x01`, which would collide with a future opcode dispatch. Bytes 0x03..0x07 are SAFE because they encode either field 0 (proto3 disallows field 0) or reserved/invalid wire types (0x06/0x07 = wire types 6/7 marked reserved in proto3), so no valid proto3 marshal output starts with them. Bytes 0x08+ would need a different dispatch shape (e.g. a 2-byte sentinel) before they could be routed safely. Caller audit (semantic change: previously-fall-through path becomes halt for 0x06/0x07): - Apply's only caller is internal/raftengine/etcd. applyNormalCommitted, which already inspects HaltApply. Strictly more conservative — bytes that previously returned a plain proto3 decode error now halt the apply loop instead of advancing setApplied. Same fail-closed shape as Stage 3. - applyEncryption's default case fails closed for any byte not yet implemented; this PR widens the dispatch but does NOT change applyEncryption's contract. - decodeLegacyRaftRequest is now unreachable for 0x03..0x07. It remains the sole path for legacy proto3 entries with first byte 0x08+ (which is the production case for is_txn = true messages). - Test sites in fsm_occ_test.go / fsm_txn_test.go / transaction_batch_test.go that call fsm.Apply with raw proto.Marshal output continue to work because their first bytes are field-1 (0x08+) or field-2 (0x10) tags — outside the 0x03..0x07 encryption range. P2 (codex, yellow): readRotationBody cast the raw purpose byte straight to `Purpose` without validating against the §5.1 enum {storage=1, raft=2}. A malformed entry with purpose=0xFF would decode without error and forward an out-of-enum value to the applier; if the applier did not independently re-validate, the malformed entry could advance setApplied — the same fail-closed contract violation as P1. Validate at decode time and surface ErrFSMWireMalformed. Caller audit: - DecodeRotation has one non-test caller, kv/fsm_encryption.go's applyRotation, which already wraps decode errors with ErrEncryptionApply via Mark+Wrap. Strictly more conservative: previous payloads that decoded silently as Purpose(0xFF) now halt the apply loop. No applier-side semantic regression because no Stage 4 applier exists yet. Refactor: extracted applyReservedOpcode helper from Apply to keep cyclomatic complexity under the cyclop budget after the dispatch widening. The HLC-lease (0x02) and encryption-range (0x03..0x07) checks now live in one place, with handled=false signalling fall-through to the legacy proto3 path. New tests: - TestApply_ReservedRange_FutureOpcodes_Halt: sweeps 0x03..0x07 with no applier wired and asserts every byte produces a haltApplyResponse wrapping ErrEncryptionApply. - TestApply_AboveReservedRange_FallsThroughToLegacy: byte 0x10 (above the range) must NOT route through HaltApply and must NOT touch the encryption applier. - TestRotation_RejectsUnknownPurpose: 0x00 / 0x03 / 0x7F / 0xFF purpose bytes all surface ErrFSMWireMalformed before reaching the applier. - TestRotation_AcceptsKnownPurposes: positive control for storage/raft round-trip. Verified: make test/lint green, GOOS=linux GOARCH=386 build clean.

bootjp · 2026-05-10T14:44:21Z

@claude review

PR748 round-2: codex P1 + P2 を反映 (commit 17cca53)。

P1 (orange) — Route unknown encryption opcodes through halt
Apply の dispatch を 0x03/0x04/0x05 から 0x03..0x07 に拡張。applyEncryption の default 経路で未実装 opcode (0x06/0x07) は ErrEncryptionApply で halt → HaltApply seam が setApplied を進めずに halt。これで rolling-upgrade で newer leader が 0x06 を emit → stale follower で silent skip、という §6.3 違反が消える。

Why upper bound 0x07, not 0x0F: proto3 wire-tag bytes 0x08..0x0D は field 1 with wire types varint/fixed64/length-delim/start-group/end-group/fixed32 — 例えば Request.is_txn = true は 0x08 0x01 で marshal される。0x08+ を encryption に routing すると legacy proto3 fallback (decodeLegacyRaftRequest) を壊す。0x03..0x07 は field 0 (proto3 で禁止) または reserved wire types 6/7 で、有効な proto3 marshal 出力は始まらない。0x08+ を将来 encryption に reserve するなら 2-byte sentinel ([opcode][version=0x01]) 等の dispatch 形を要再設計。

P2 (yellow) — Reject unknown rotation purpose
readRotationBody で raw purpose byte を Purpose に直接 cast していたため、0xFF などが decoder を素通り。§5.1 enum {storage=1, raft=2} を decode 時に validate、それ以外は ErrFSMWireMalformed で fail-closed。

Caller audit (意味論変更):

Apply → applyNormalCommitted: HaltApply interface 経由で halt 認識、strictly more conservative。
applyEncryption の default 経路: 既に ErrEncryptionApply wrap。今回の widening は dispatch 拡張のみで applyEncryption の契約は不変。
DecodeRotation non-test caller は kv/fsm_encryption.go applyRotation 1 件のみ、既に ErrEncryptionApply で wrap → halt。0xFF は previously silent → now halt、fail-closed 契約が強化。
既存 test (fsm_occ_test.go 等) で fsm.Apply(proto.Marshal(...)) を直接呼ぶものは first byte が field-1 tag (0x08+) または field-2 tag (0x10) で範囲外、test churn なし。

Refactor: applyReservedOpcode helper を抽出 (cyclop 制約)。HLC lease (0x02) + encryption (0x03..0x07) を 1 箇所に統合。

New tests: TestApply_ReservedRange_FutureOpcodes_Halt (0x03..0x07 を sweep して全て halt 確認)、TestApply_AboveReservedRange_FallsThroughToLegacy (byte 0x10 が halt しない事を確認)、TestRotation_RejectsUnknownPurpose (0x00/0x03/0x7F/0xFF の reject)、TestRotation_AcceptsKnownPurposes (positive control)。

make test/lint green、GOOS=linux GOARCH=386 build clean。

claude · 2026-05-10T14:44:37Z

Claude finished @bootjp's task in 2m 27s —— View job

Round 3 Review

Todo

Load task context
Read kv/fsm.go — verify opcode range widening and applyReservedOpcode helper
Read kv/fsm_encryption.go — verify applyEncryption default halt path
Read kv/fsm_encryption_test.go — verify new reserved-range and above-range tests
Read internal/encryption/fsmwire/wire.go — verify purpose validation in readRotationBody
Read internal/encryption/fsmwire/wire_test.go — verify new purpose regression tests
Post comprehensive review

Fix Verification

P1 — Unknown encryption opcodes halt (not fall through) ✅

fsmwire/wire.go:138-141 — OpEncryptionMin = 0x03, OpEncryptionMax = 0x07. The comment block at lines 112-141 correctly documents the proto3 wire-tag analysis: field 1 occupies 0x08..0x0D, so routing 0x08+ through the encryption dispatcher would break decodeLegacyRaftRequest for any payload whose first encoded field is field 1 (e.g. Request.is_txn = true → 0x08 0x01). The 0x03..0x07 ceiling is safe because proto3 disallows field 0 and marks wire types 6/7 reserved — no valid proto3 marshal output starts with them. Analysis is sound.

kv/fsm.go:139-151 — applyReservedOpcode routes data[0] >= OpEncryptionMin && data[0] <= OpEncryptionMax to applyEncryption. Clean extraction from Apply, keeps the cyclomatic budget. The HaltApply seam in applyEncryption's default case (line 74-77) fails closed on any byte 0x06 or 0x07 even with a wired applier. This is the critical invariant: a stale follower receiving a future opcode from a newer leader halts rather than silently advancing setApplied.

kv/fsm_encryption_test.go:241-252 — TestApply_ReservedRange_FutureOpcodes_Halt sweeps [OpEncryptionMin, OpEncryptionMax] (0x03..0x07) with no applier wired and asserts each byte produces ErrEncryptionApply. The loop is driven by the constants, so any future change to OpEncryptionMax automatically extends or shrinks the sweep. ✅

kv/fsm_encryption_test.go:261-278 — TestApply_AboveReservedRange_FallsThroughToLegacy sends 0x10 and asserts the response is not a haltApplyResponse. This effectively pins OpEncryptionMax < 0x10: if the constant were ever bumped to 0x10 or higher, 0x10 would route to applyEncryption → halt, failing the test. ✅

P2 — Purpose validated at decode time ✅

fsmwire/wire.go:488-495 — The purpose switch validates against {PurposeStorage=1, PurposeRaft=2} and returns ErrFSMWireMalformed for anything else. The // known, fall through case is deliberately empty rather than using a return to make the validation structure readable. Correct.

wire_test.go:239-256 — TestRotation_RejectsUnknownPurpose covers {0x00, 0x03, 0x7F, 0xFF} — the zero value, a value just above the valid enum, the mid-range, and max byte. Comprehensive.

wire_test.go:263-276 — TestRotation_AcceptsKnownPurposes is the positive control. Both PurposeStorage and PurposeRaft round-trip cleanly. ✅

Purpose byte offset in test: purposeOffset = 1 (ver) + 1 (subtag) + 4 (dek_id) = 6. Matches the EncodeRotation layout ([ver 1] [subtag 1] [dek_id 4] [purpose 1]). ✅

Non-Blocking: Test Comment Claims 0x0F, Constants Say 0x07

Two test docstrings in kv/fsm_encryption_test.go describe the encryption range as [0x03, 0x0F] but the actual constant is OpEncryptionMax = 0x07:

Line 229: "every byte in [0x03, 0x0F] must route through the encryption fail-closed halt path"
Line 259: "The contract is 'encryption owns 0x03..0x0F'; that bound must hold from both sides."

The test logic is correct — it iterates via the constants and the 0x10 boundary test works — but the comments claim a range that is 8 bytes wider than the code actually enforces. A future reader skimming the comments (without checking OpEncryptionMax) might believe 0x08..0x0F are also reserved, which is not the case (those bytes collide with proto3 field-1 wire tags and are explicitly excluded from the encryption range).

Fix this →

Verdict

Both Round 2 findings are fixed correctly. The opcode range widening to 0x03..0x07 is sound (proto3 field-tag analysis documented in full), the applyReservedOpcode helper extraction is clean, and the purpose validation at decode time correctly strengthens the §6.3 fail-closed contract. All four new tests — TestApply_ReservedRange_FutureOpcodes_Halt, TestApply_AboveReservedRange_FallsThroughToLegacy, TestRotation_RejectsUnknownPurpose, and TestRotation_AcceptsKnownPurposes — are correct and verify the right properties.

The only outstanding item is two test docstrings that say 0x0F where the code says 0x07. This is misleading but non-blocking; nothing depends on the incorrect comment values.

LGTM pending the comment fix above (or at your discretion, it's a documentation-only gap).

…ude[bot] non-blocking) Round-3 review (claude[bot] LGTM with one non-blocking nit): two test docstrings in TestApply_ReservedRange_FutureOpcodes_Halt and TestApply_AboveReservedRange_FallsThroughToLegacy still described the encryption opcode range as `[0x03, 0x0F]`, but the actual constant in fsmwire/wire.go is `OpEncryptionMax = 0x07`. The range was narrowed in commit 17cca53 to dodge proto3 wire-tag collisions for field 1 (0x08..0x0D), but those two comment blocks were not updated. Code logic is correct (the sweep iterates from OpEncryptionMin to OpEncryptionMax, so it follows the constant) — only the human-readable comments needed updating. A future reader skimming the comments without checking the constant might mistakenly believe 0x08..0x0F are also routed through encryption halt, which would mis-explain the design. Now both comments name 0x07 explicitly and cross-reference the proto3-collision rationale documented on OpEncryptionMax in fsmwire/wire.go. No code change; tests pass unchanged.

bootjp · 2026-05-10T14:50:39Z

@claude review

PR748 round-3 (claude[bot] non-blocking nit): TestApply_ReservedRange_FutureOpcodes_Halt と TestApply_AboveReservedRange_FallsThroughToLegacy の docstring に残っていた 0x0F 文言を OpEncryptionMax=0x07 に更新 (commit af944d7)。

範囲は 17cca53 で 0x07 に narrow 済 (proto3 field-1 collision 0x08..0x0D 回避)、定数 sweep の test logic は変わらず、human-readable comment のみ修正。test/lint pass。

claude · 2026-05-10T14:50:50Z

Claude finished @bootjp's task in 53s —— View job

Round 4 Review

Todo

Load task context
Read kv/fsm_encryption_test.go — verify 0x0F → 0x07 docstring fix in both test functions
Read internal/encryption/fsmwire/wire.go — confirm OpEncryptionMax = 0x07
Post final review

Fix Verification

Round 3 nit — test docstrings updated from 0x0F to 0x07 ✅

kv/fsm_encryption_test.go:228-229 (TestApply_ReservedRange_FutureOpcodes_Halt):

// every byte in [OpEncryptionMin, OpEncryptionMax] = [0x03, 0x07] must route through the encryption
// fail-closed halt path ...

Correct. The previous text said 0x0F; it now matches OpEncryptionMax = 0x07 (wire.go:140).

kv/fsm_encryption_test.go:263-265 (TestApply_AboveReservedRange_FallsThroughToLegacy):

// The contract is "encryption owns 0x03..0x07 (OpEncryptionMax)"; bytes 0x08+
// are reserved for the proto3 fallback ...

Correct. The previous text said 0x03..0x0F; it now accurately describes the actual range, and the parenthetical (OpEncryptionMax) anchors it to the constant so any future widening of the constant would make it self-updating.

Both docstrings now agree with the constant definition (wire.go:138-141):

const (
    OpEncryptionMin byte = 0x03
    OpEncryptionMax byte = 0x07
)

The additional context in TestApply_ReservedRange_FutureOpcodes_Halt (lines 231-236) correctly explains why the upper bound is 0x07 rather than 0x0F — the proto3 field-1 collision at 0x08..0x0D — which is exactly the analysis that motivated the narrowing in commit 17cca535.

No other documentation discrepancies found across the four changed packages.

Verdict

The Round 3 non-blocking nit is resolved. All three rounds of findings — OOM DoS in readRegistrationBatch, safeU32 32-bit compile error, readLenU32Bytes truncation, unknown opcodes falling through to legacy decode, unknown rotation purpose accepted silently, and now the docstring mismatch — are fully addressed.

LGTM — Stage 4 encryption dispatch seam is correct and well-documented. Ready to merge.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af944d7ad6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-10T14:56:03Z

+	need := uint64(count) * uint64(registrationSize)
+	if uint64(rem) < need {
+		return nil, errors.Wrapf(ErrFSMWireMalformed,
+			"bootstrap: batch count %d × %d > remaining %d", count, registrationSize, rem)
+	}
+	regs := make([]RegistrationPayload, count)


Validate batch count against decoded struct allocation

readRegistrationBatch only verifies count * registrationSize (14 bytes on the wire) against remaining input before doing make([]RegistrationPayload, count), but each RegistrationPayload occupies more memory in Go (24 bytes on 64-bit due to padding). A crafted bootstrap payload can therefore pass the wire-length check yet force a much larger in-memory allocation and OOM/panic during decode, turning malformed Raft input into process-killing memory pressure again. Add an additional bound using unsafe.Sizeof(RegistrationPayload{}) (or equivalent constant) before allocating the slice.

Useful? React with 👍 / 👎.

…s (PR748 codex P1 round-3) codex P1 round-3 finding: readRegistrationBatch's wire-bytes guard (`count * registrationSize <= remaining`) is necessary but NOT sufficient. Each RegistrationPayload occupies 24 bytes in memory on 64-bit (uint32 + 4B padding + uint64 + uint16 + 6B trailing padding) versus 14 bytes packed on the wire. A crafted bootstrap payload that fits the wire-bytes check could still drive a >70% larger heap allocation; for a multi-MiB raft entry that is multi-GiB of resident memory. Add a hard absolute cap on `count` (16384 = 1<<14) BEFORE the wire-bytes check. The cap bounds the post-make slice at maxBootstrapBatchCount * sizeof(RegistrationPayload) = 16384 * 24 = 384 KiB, independent of struct padding or raft entry size. 16384 is ~8x the largest realistic raft cluster (low-thousands of nodes × 2 DEK purposes), so legitimate bootstraps are uncapped in practice. Caller audit (semantic change: previously-passing-but-large counts now reject): - readRegistrationBatch's only call site is DecodeBootstrap, which propagates to kv/fsm_encryption.go's applyBootstrap. applyBootstrap already wraps decode errors with ErrEncryptionApply via Mark+Wrap, so the new ErrFSMWireMalformed return rides the existing HaltApply seam — same fail-closed shape as the round-1 huge-count fix and the P2 unknown-purpose fix. No applier-side regression because no Stage 4 applier exists yet. - Existing TestBootstrap_RoundTrip / RejectsHugeBatchCount / RejectsTrailingBytes / RejectsTruncated all continue to pass: round-trip uses count ≤ 4, the 0xFFFFFFFF count test now hits the cap check before the wire-bytes check (still surfaces ErrFSMWireMalformed — same observable contract). New regression test TestBootstrap_RejectsOverCapBatchCount writes count = maxBootstrapBatchCount + 1 = 16385 directly into the encoded prefix (without extending the trailing bytes) and asserts the decoder rejects with ErrFSMWireMalformed BEFORE the wire bounds check. The cap is an unexported defensive constant, not part of the wire contract — the test hardcodes 16385 with a cross-reference comment to wire.go's maxBootstrapBatchCount. Verified: make test/lint green, GOOS=linux GOARCH=386 build clean.

bootjp · 2026-05-10T15:03:01Z

@claude review

PR748 round-4 codex P1 反映 (commit 819cbe0): in-memory padding を考慮した batch-count cap 追加。

P1 (orange) — Validate batch count against decoded struct allocation
readRegistrationBatch の wire-bytes 検査 (count * 14 <= remaining) のみだと、RegistrationPayload が in-memory で 24 byte (uint32 + padding + uint64 + uint16 + trailing padding) であるため、>70% 大きい heap allocation を allow してしまう。multi-MiB raft entry で multi-GiB 確保される可能性。

Fix: maxBootstrapBatchCount = 1<<14 = 16384 の絶対 cap を wire-bytes check より前 に追加。post-make slice は 16384 × 24 = 384 KiB に bound (struct padding に独立)。16384 は最大級の raft クラスタ (低千ノード × storage/raft 2 DEK) の 8x 余裕。

Caller audit: readRegistrationBatch 唯一の non-test caller は DecodeBootstrap → kv/fsm_encryption.go:98 applyBootstrap、既に ErrEncryptionApply で wrap → halt。新規 reject path は既存 fail-closed seam に乗る。round-1 の huge-count fix / round-2 の P2 unknown-purpose fix と同じ contract、applier 側の意味論変更なし。

新規テスト TestBootstrap_RejectsOverCapBatchCount: count=16385 を encoded prefix に直接書き込み、wire-bytes check より前で ErrFSMWireMalformed reject を確認。maxBootstrapBatchCount は internal defensive 定数で wire contract 外、test は literal 16385 を cross-reference comment 付きで使用。

make test/lint green、GOOS=linux GOARCH=386 build clean。

claude · 2026-05-10T15:03:19Z

Claude finished @bootjp's task in 2m 1s —— View job

Round 5 Review

Todo

Load task context
Read wire.go — verify maxBootstrapBatchCount cap placement and logic
Read wire_test.go — verify TestBootstrap_RejectsOverCapBatchCount test
Verify codex P1 concern about in-memory struct size vs wire size
Post comprehensive review

Fix Verification

Codex P1 Round-4 — In-memory size cap independent of struct padding ✅

The fix takes a cleaner approach than the codex suggestion of unsafe.Sizeof(). A sizeof-based check would still allow a very large allocation on a sufficiently large raft entry (it would just compute a tighter wire-bytes threshold); the absolute cap is strictly stronger.

Ordering is correct (wire.go:374-417):

1. count, ok := r.readU32()                       // read count
2. if count > maxBootstrapBatchCount { return err } // cap BEFORE make
3. if uint64(rem) < need { return err }            // wire-bytes BEFORE make
4. regs := make([]RegistrationPayload, count)      // allocation

The cap check (line 390) fires before either the wire-bytes check or the make call. A payload naming count=16385 with sufficient trailing bytes passes the wire-bytes guard (16385 × 14 = 229390 bytes) but is rejected at the cap. This is the correct behaviour — the struct padding concern exists even when the payload has enough bytes.

Cap value is well-reasoned (wire.go:206-225): maxBootstrapBatchCount = 1<<14 = 16384. The constant comment documents that 16384 is ~8× the largest realistic deployment (low-thousand-node clusters × 2 DEK purposes = at most ~16000 entries for an 8000-node cluster). The post-cap ceiling of 16384 × 24 bytes = 384 KiB is bounded independent of future struct-padding changes, which is the key property the codex was asking for and makes this approach superior to unsafe.Sizeof().

TestBootstrap_RejectsOverCapBatchCount (wire_test.go) is correct:

countOffset = 1+4+4+0+4+4+0 = 17 matches the EncodeBootstrap layout for zero-length wrapped fields ✅
Bytes [0x00, 0x00, 0x40, 0x01] = 0x00004001 = 16385 = (1<<14) + 1 ✅
The test asserts ErrFSMWireMalformed with an undersized payload tail, so the cap must reject before the wire-bytes check would even fire — this is the critical property under test ✅
The comment cross-references the constant as a literal (16385 = maxBootstrapBatchCount + 1) without importing the unexported constant, which is the right call for an internal defensive bound ✅

Interaction with TestBootstrap_RejectsHugeBatchCount (round 1): count=0xFFFFFFFF is also > 16384, so it now hits the cap path rather than the wire-bytes path. The test still validates ErrFSMWireMalformed — the semantics are correct, the failure surface has simply moved earlier.

Verdict

All four rounds of findings are fully addressed:

Round	Finding	Status
1 / gemini+codex	`readRegistrationBatch` OOM on huge count	Fixed (wire-bytes guard)
1 / codex P2	`safeU32` GOARCH=386 compile error	Fixed (uint64 compare)
1 / gemini medium	`readLenU32Bytes` uint32 truncation	Fixed (uint64 compare)
2 / codex P1	Unknown opcodes fall through to legacy decode	Fixed (range widened to 0x03..0x07, `applyReservedOpcode` helper)
2 / codex P2	Unknown rotation purpose accepted silently	Fixed (purpose validated at decode time)
3 / non-blocking	Test docstrings claimed 0x0F, constant says 0x07	Fixed
4 / codex P1	`make([]RegistrationPayload, count)` bounded by wire bytes, not in-memory size	Fixed (absolute cap `maxBootstrapBatchCount = 16384`)

LGTM — the cap approach is strictly stronger than unsafe.Sizeof(), correctly placed, and properly regression-tested. Stage 4 dispatch seam is correct through all four review rounds.

claude[bot] Round-5 LGTM is in; the remaining mergeStateStatus=BLOCKED is two unrelated test-job flakes that already passed locally: - test (jepsen) — Maven Central checksum validation transient (com.cognitect.aws:dynamodb, slf4j-nop, slf4j-api). No source change in this PR touches Jepsen. - test (ubuntu-latest) — TestRedis_MisskeyConnectionCompatibility, a 200ms PX timing-sensitive Redis adapter test that this PR does not touch (adapter/redis_misskey_compat_test.go is unchanged). Re-runs locally with go test -race -count=1 → PASS. Empty commit to retrigger the workflows. No code change.

coderabbitai

🧹 Nitpick comments (6)

internal/raftengine/etcd/engine.go (1)

2058-2065: 💤 Low value

Log message hardcodes "encryption FSM" but the seam is generic.

The HaltApply interface is generic (any FSM response can implement it), but the log message labels the cause as "encryption FSM apply". Since Stage 4 only has the one user this is accurate today, but if a future stage adds a non-encryption HaltApply caller, the log will misattribute the cause. Consider rewording to something neutral like "FSM apply requested halt; not advancing setApplied" so the log stays accurate as the seam grows.
♻️ Suggested change
-		if herr := h.HaltApply(); herr != nil {
-			slog.Error("encryption FSM apply requested halt; not advancing setApplied",
-				slog.Uint64("entry_index", entry.Index),
-				slog.Any("err", herr))
-			return errors.Wrap(herr, "raftengine/etcd: FSM-requested apply halt")
-		}
+		if herr := h.HaltApply(); herr != nil {
+			slog.Error("FSM apply requested halt; not advancing setApplied",
+				slog.Uint64("entry_index", entry.Index),
+				slog.Any("err", herr))
+			return errors.Wrap(herr, "raftengine/etcd: FSM-requested apply halt")
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/raftengine/etcd/engine.go` around lines 2058 - 2065, The log message
hardcodes "encryption FSM" while the HaltApply check is generic; update the
slog.Error call in the HaltApply handling (the type assertion
response.(interface{ HaltApply() error }) and the subsequent call to
h.HaltApply()) to use a neutral message such as "FSM apply requested halt; not
advancing setApplied" (or similar) so it no longer incorrectly attributes the
halt to an encryption-specific FSM.

internal/encryption/registry.go (1)

89-93: 💤 Low value

Optional: prefer bytes.HasPrefix over hand-rolled loop.

The byte-by-byte prefix check could be replaced with bytes.HasPrefix(key, WriterRegistryPrefix) for idiomatic clarity. The behavior is equivalent given the prior length guard. Not a defect.

♻️ Suggested change

-	for i, b := range WriterRegistryPrefix {
-		if key[i] != b {
-			return 0, 0, errors.WithStack(ErrRegistryKeyMalformed)
-		}
-	}
+	if !bytes.HasPrefix(key, WriterRegistryPrefix) {
+		return 0, 0, errors.WithStack(ErrRegistryKeyMalformed)
+	}

(Would also need adding "bytes" to the imports.)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/encryption/registry.go` around lines 89 - 93, Replace the
hand-rolled prefix loop that compares key bytes against WriterRegistryPrefix
with bytes.HasPrefix(key, WriterRegistryPrefix) for clarity and idiomatic style
(keep the existing length guard), and remove the for-loop; also add the "bytes"
import to the file so the new call compiles. Ensure the same error return
(errors.WithStack(ErrRegistryKeyMalformed)) remains when the prefix check fails.

internal/encryption/fsmwire/wire.go (1)

96-110: 💤 Low value

Reserved sub-tag space documented but not statically enforced.

The reserved sub-tag values (0x02..0x05) are only documented in a comment; nothing prevents a future commit from silently re-using them. Consider declaring named constants for the reserved range (or an explicit _ = 0x02 block) so a future addition that collides with a documented reservation is visible at the declaration site rather than hidden in a code review.

Not a defect — fine to defer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/encryption/fsmwire/wire.go` around lines 96 - 110, The comment
documents reserved sub-tag bytes (0x02..0x05) but the code only declares
RotateSubRotateDEK, leaving the reserved range unenforced; add explicit
declarations for the reserved bytes (e.g., named constants or blank assignments)
adjacent to RotateSubRotateDEK so any future addition that reuses 0x02..0x05
will be obvious at the declaration site—for example add constants or `_ = 0x02`
entries for the documented tags to make the reservation compile-time visible and
prevent silent reuse of 0x02..0x05.

kv/fsm_encryption_test.go (2)

285-306: ⚡ Quick win

The legacy-opcode test only exercises 0x02.

The comment here covers 0x00..0x02, but the body only probes raftEncodeHLCLease. Please sweep all three legacy opcodes so a future range-check regression on 0x00 or 0x01 is caught too.

As per coding guidelines, "Co-locate test files in the same package and use table-driven test patterns."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/fsm_encryption_test.go` around lines 285 - 306, Update
TestApply_LegacyOpcodesUnaffected to exercise all three legacy opcodes (0x00,
0x01, 0x02) instead of only raftEncodeHLCLease: convert the test into a small
table-driven loop that iterates over the opcode values (e.g. []byte{0x00, 0x01,
raftEncodeHLCLease}), build an appropriate payload for each (for 0x02 keep the
8-byte HLC payload; for 0x00/0x01 use minimal payloads expected by
newFSMWithFake/Apply), call f.Apply for each case, and assert after each
iteration that applier.regCalls, bootstrapCalls and rotationCalls remain zero
(using
applier.regCalls.Load()+applier.bootstrapCalls.Load()+applier.rotationCalls.Load()).
Keep the test within the same package and reuse fakeApplier/newFSMWithFake to
co-locate behavior checks.

96-149: ⚡ Quick win

Tighten the bootstrap and rotation happy-path assertions.

These two tests currently pass even if the dispatcher drops WrappedRaft, BatchRegistry, SubTag, Wrapped, or ProposerRegistration, or accidentally invokes an extra handler. The registration test already pins both payload fidelity and handler isolation; these should do the same for the other two opcodes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/fsm_encryption_test.go` around lines 96 - 149, The bootstrap and rotation
tests only assert partial fields and call counts, letting missing fields or
extra handler invocations slip through; update TestApply_Bootstrap_HappyPath to
also assert applier.lastBootstrap.WrappedRaft equals want.WrappedRaft and
applier.lastBootstrap.BatchRegistry deep-equals want.BatchRegistry and that
applier.bootstrapCalls.Load() == 1 (and other handler counts remain zero if
present), and update TestApply_Rotation_HappyPath to assert
applier.lastRotation.SubTag, applier.lastRotation.Wrapped, and
applier.lastRotation.ProposerRegistration exactly match want and that
applier.rotationCalls.Load() == 1 (and ensure no other handler was invoked);
locate these checks around the existing applier.* assertions and use the same
applier, f.Apply(...) and haltApplyOf(...) setup already in the tests.

internal/raftengine/etcd/encryption_test.go (1)

437-456: ⚡ Quick win

Also assert the success-path bookkeeping.

This test only checks e.applied. A regression that forgets to publish appliedIndex or bypasses fsm.Apply would still pass here, even though the halt-path test already pins both fields.

Suggested additions

 func TestApplyCommitted_HaltApply_NilContinues(t *testing.T) {
 	t.Parallel()
 	fsm := &haltStateMachine{err: nil}
 	e := newTestEngine(fsm, nil, nil)
 	good := raftpb.Entry{
 		Type:  raftpb.EntryNormal,
 		Index: 100,
 		Data:  encodeProposalEnvelope(7, []byte("payload")),
 	}
 	if err := e.applyCommitted([]raftpb.Entry{good}); err != nil {
 		t.Fatalf("applyCommitted: %v", err)
 	}
+	if got := fsm.calls.Load(); got != 1 {
+		t.Fatalf("fsm.Apply call count = %d, want 1", got)
+	}
 	if e.applied != 100 {
 		t.Fatalf("applied = %d, want 100 (nil HaltApply must advance)", e.applied)
 	}
+	if got := e.appliedIndex.Load(); got != 100 {
+		t.Fatalf("appliedIndex = %d, want 100", got)
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/raftengine/etcd/encryption_test.go` around lines 437 - 456, The test
TestApplyCommitted_HaltApply_NilContinues only asserts e.applied; extend it to
also verify the success-path bookkeeping by asserting that the FSM's Apply was
invoked for the entry and that the engine published the applied index (assert
fsm.Apply observed index 100 and that e.appliedIndex == 100 or the engine's
published appliedIndex field/channel reflects 100). Locate the haltStateMachine
used as fsm and confirm it records the last applied index (or add such a small
probe on haltStateMachine if missing), and assert the engine’s published
appliedIndex (or published applied index channel) matches 100 after calling
e.applyCommitted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/encryption/fsmwire/wire.go`:
- Around line 96-110: The comment documents reserved sub-tag bytes (0x02..0x05)
but the code only declares RotateSubRotateDEK, leaving the reserved range
unenforced; add explicit declarations for the reserved bytes (e.g., named
constants or blank assignments) adjacent to RotateSubRotateDEK so any future
addition that reuses 0x02..0x05 will be obvious at the declaration site—for
example add constants or `_ = 0x02` entries for the documented tags to make the
reservation compile-time visible and prevent silent reuse of 0x02..0x05.

In `@internal/encryption/registry.go`:
- Around line 89-93: Replace the hand-rolled prefix loop that compares key bytes
against WriterRegistryPrefix with bytes.HasPrefix(key, WriterRegistryPrefix) for
clarity and idiomatic style (keep the existing length guard), and remove the
for-loop; also add the "bytes" import to the file so the new call compiles.
Ensure the same error return (errors.WithStack(ErrRegistryKeyMalformed)) remains
when the prefix check fails.

In `@internal/raftengine/etcd/encryption_test.go`:
- Around line 437-456: The test TestApplyCommitted_HaltApply_NilContinues only
asserts e.applied; extend it to also verify the success-path bookkeeping by
asserting that the FSM's Apply was invoked for the entry and that the engine
published the applied index (assert fsm.Apply observed index 100 and that
e.appliedIndex == 100 or the engine's published appliedIndex field/channel
reflects 100). Locate the haltStateMachine used as fsm and confirm it records
the last applied index (or add such a small probe on haltStateMachine if
missing), and assert the engine’s published appliedIndex (or published applied
index channel) matches 100 after calling e.applyCommitted.

In `@internal/raftengine/etcd/engine.go`:
- Around line 2058-2065: The log message hardcodes "encryption FSM" while the
HaltApply check is generic; update the slog.Error call in the HaltApply handling
(the type assertion response.(interface{ HaltApply() error }) and the subsequent
call to h.HaltApply()) to use a neutral message such as "FSM apply requested
halt; not advancing setApplied" (or similar) so it no longer incorrectly
attributes the halt to an encryption-specific FSM.

In `@kv/fsm_encryption_test.go`:
- Around line 285-306: Update TestApply_LegacyOpcodesUnaffected to exercise all
three legacy opcodes (0x00, 0x01, 0x02) instead of only raftEncodeHLCLease:
convert the test into a small table-driven loop that iterates over the opcode
values (e.g. []byte{0x00, 0x01, raftEncodeHLCLease}), build an appropriate
payload for each (for 0x02 keep the 8-byte HLC payload; for 0x00/0x01 use
minimal payloads expected by newFSMWithFake/Apply), call f.Apply for each case,
and assert after each iteration that applier.regCalls, bootstrapCalls and
rotationCalls remain zero (using
applier.regCalls.Load()+applier.bootstrapCalls.Load()+applier.rotationCalls.Load()).
Keep the test within the same package and reuse fakeApplier/newFSMWithFake to
co-locate behavior checks.
- Around line 96-149: The bootstrap and rotation tests only assert partial
fields and call counts, letting missing fields or extra handler invocations slip
through; update TestApply_Bootstrap_HappyPath to also assert
applier.lastBootstrap.WrappedRaft equals want.WrappedRaft and
applier.lastBootstrap.BatchRegistry deep-equals want.BatchRegistry and that
applier.bootstrapCalls.Load() == 1 (and other handler counts remain zero if
present), and update TestApply_Rotation_HappyPath to assert
applier.lastRotation.SubTag, applier.lastRotation.Wrapped, and
applier.lastRotation.ProposerRegistration exactly match want and that
applier.rotationCalls.Load() == 1 (and ensure no other handler was invoked);
locate these checks around the existing applier.* assertions and use the same
applier, f.Apply(...) and haltApplyOf(...) setup already in the tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc5cdb6c-3977-4146-b06b-24c7ebae0aad

📥 Commits

Reviewing files that changed from the base of the PR and between 9303e52 and f6956f0.

📒 Files selected for processing (11)

internal/encryption/errors.go
internal/encryption/fsmwire/wire.go
internal/encryption/fsmwire/wire_test.go
internal/encryption/registry.go
internal/encryption/registry_test.go
internal/raftengine/etcd/encryption.go
internal/raftengine/etcd/encryption_test.go
internal/raftengine/etcd/engine.go
kv/fsm.go
kv/fsm_encryption.go
kv/fsm_encryption_test.go

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 10, 2026

View reviewed changes

Base automatically changed from feat/encryption-raft-envelope to main May 10, 2026 15:53

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

	if rem < 0 \|\| uint32(rem) < n { //nolint:gosec // rem >= 0 enforced by the rem < 0 branch
	if rem < 0 \|\| uint64(rem) < uint64(n) { //nolint:gosec // rem >= 0 enforced by the rem < 0 branch

Conversation

bootjp commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Out of scope (deferred)

Test plan

Self-review (CLAUDE.md 5 passes)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Round 2 Review

Fix Verification

P1 / security-high — OOM DoS in readRegistrationBatch ✅

P2 / high — safeU32 compile error on GOARCH=386 ✅

medium — readLenU32Bytes uint32 truncation ✅

Remaining Observations (non-blocking)

Verdict

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Round 3 Review

Fix Verification

P1 — Unknown encryption opcodes halt (not fall through) ✅

P2 — Purpose validated at decode time ✅

Non-Blocking: Test Comment Claims 0x0F, Constants Say 0x07

Verdict

Uh oh!

bootjp commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Round 4 Review

Fix Verification

bootjp commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading

P1 / security-high — OOM DoS in `readRegistrationBatch` ✅

P2 / high — `safeU32` compile error on GOARCH=386 ✅

medium — `readLenU32Bytes` uint32 truncation ✅

claude Bot commented May 10, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading