Skip to content

feat(sqs): live-queue reaper enumerates partitioned keyspace (Phase 3.D PR 6b)#736

Merged
bootjp merged 2 commits intomainfrom
feat/sqs-htfifo-reaper-livequeue
May 4, 2026
Merged

feat(sqs): live-queue reaper enumerates partitioned keyspace (Phase 3.D PR 6b)#736
bootjp merged 2 commits intomainfrom
feat/sqs-htfifo-reaper-livequeue

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 4, 2026

Summary

Phase 3.D PR 6b: live-queue reaper enumerates partitioned keyspace. Follow-up to PR #735 (6a) — the tombstone-driven sweep already walks partitioned data on DeleteQueue / PurgeQueue, but the live-queue retention reaper still only saw the legacy keyspace, so:

  • retention-expired messages on partitioned queues leaked their data / vis / byage / group rows forever (reapQueue walked sqsMsgByAgePrefixAllGenerations only),
  • expired dedup records on partitioned FIFO queues leaked forever (reapExpiredDedup scanned SqsMsgDedupPrefix only — empty for partitioned queues since sqsMsgDedupKeyDispatch routes their writes under SqsPartitionedMsgDedupPrefix).

Closes the live-queue half of the Codex P2 from PR #732 round 0; PR 6a covered the tombstoned-cohort half.

What changes

  • reapQueue: legacy byage walk extracted as reapQueueLegacy (byte-identical to pre-PR-6b for non-partitioned queues). Adds reapQueuePartition step that runs once per partition for PartitionCount > 1 queues. Per-partition budget per the §6 design ("partitions × budget per cycle"); 30s tick interval comfortably absorbs.

  • reapPartitionedPage: partitioned twin of reapPage. Same live-vs-orphan classification, but parses each entry with parseSqsPartitionedMsgByAgeKey and routes the dispatch through reapOneRecordPartitioned.

  • classifyPartitionedByAgeEntry: helper extracted from reapPartitionedPage so the loop body stays under the cyclop ceiling. Returns (parsedKey, reapable bool).

  • reapExpiredDedup (signature changed): now takes *sqsQueueMeta and routes by PartitionCount. Legacy meta → reapExpiredDedupLegacy (byte-identical). Partitioned meta → reapExpiredDedupPartitioned (NEW), iterates each partition's dedup prefix under its own per-partition budget.

Caller audit

  • reapQueue — one production caller (reapAllQueues); signature unchanged. Non-partitioned queues byte-identical; partitioned get the extra per-partition pass.
  • reapExpiredDedup — signature changed to take *sqsQueueMeta; one production caller (reapAllQueues), updated. No tests called it directly.
  • New helpers (reapQueueLegacy / reapQueuePartition / reapPartitionedPage / reapExpiredDedupLegacy / reapExpiredDedupPartitioned / classifyPartitionedByAgeEntry) each have exactly one production caller in the new live-queue reap path.
  • reapOneRecordPartitioned (existing PR 6a helper): previously called from reapDeadByAgePartitionPage (tombstone path); now also from reapPartitionedPage (live-queue path). Same dispatch semantics.

Tests

  • New TestSQSServer_PartitionedFIFO_LiveQueueDedupReaperPartitions: 4-partition queue, send across 6 distinct groups, backdate every partitioned dedup record's ExpiresAtMillis, run reapAllQueues, assert every partitioned dedup row across [0, 4) is gone. Pre-PR-6b reaper would leave every row in place.

Self-review (CLAUDE.md)

  1. Data loss — Closes the live-queue dedup leak + partitioned retention-expired-message leak. Legacy queues unchanged.
  2. Concurrency / distributed failures — Reaper still runs only on the leader. Per-partition pass is sequential; per-partition budget bounds the pass. OCC semantics on each record reap unchanged.
  3. Performance — Per-tick partitioned-queue cost grows from O(1 walk) to O(partition_count walks) on byage AND dedup. Each partition bounded by sqsReaperPerQueueBudget. 30s tick interval comfortably absorbs 32-partition × per-queue budget per design.
  4. Data consistency — Live-vs-orphan classification on partitioned byage mirrors the legacy branch exactly (reapPage / reapPartitionedPage share the rules through classifyPartitionedByAgeEntry). PartitionCount immutability means the meta-driven iteration bound matches the on-disk keys.
  5. Test coverage — One new wire-level integration test for the partitioned dedup walk; the partitioned byage walk reuses parsing / dispatch helpers already covered by PR 6a's tombstone-reap integration test.

Test plan

  • make lint — 0 issues
  • Targeted reaper / retention / dedup / HTFIFO / PartitionedFIFO suites (-race, clean)
  • Wider regression on Send/Receive/Delete + CreateQueue/DeleteQueue/PurgeQueue (-race, clean)
  • CI: full Jepsen + race

Summary by CodeRabbit

  • Bug Fixes

    • Fixed deduplication record cleanup in partitioned FIFO queues to properly remove expired records across all partitions.
    • Enhanced cleanup mechanism to correctly handle partition-aware behavior and prevent memory leaks in partitioned queue scenarios.
  • Tests

    • Added comprehensive test for deduplication record cleanup across partitioned FIFO queue partitions.

….D PR 6b)

Follow-up to PR 6a (#735). The tombstone-driven reaper now sweeps
partitioned data / vis / byage / dedup / group records on
DeleteQueue / PurgeQueue, but the live-queue retention reaper
still walked only the legacy keyspace, so:

  - retention-expired messages on partitioned queues leaked their
    data / vis / byage / group rows forever (reapQueue's byage
    walk used sqsMsgByAgePrefixAllGenerations only),
  - expired dedup records on partitioned FIFO queues leaked
    forever (reapExpiredDedup's prefix scan used SqsMsgDedupPrefix
    only — empty for partitioned queues, since
    sqsMsgDedupKeyDispatch routes their writes under
    SqsPartitionedMsgDedupPrefix).

Closes the live-queue half of the Codex P2 from PR #732 round 0
("Reap partitioned dedup records to prevent key growth"); PR 6a
covered the tombstoned-cohort half.

Scope (the second slice of §11 PR 6 from the split-queue-FIFO
design doc):

  - reapQueue: legacy byage walk extracted as reapQueueLegacy,
    behaviour byte-identical to pre-PR-6b for non-partitioned
    queues. Adds a new reapQueuePartition step that runs once per
    partition for PartitionCount > 1 queues. Each partition gets
    its own per-partition budget per the §6 design ("partitions ×
    budget per cycle"); a 32-partition queue thus allows up to 32
    × sqsReaperPerQueueBudget records per tick, comfortably
    within the 30s reaper interval.

  - reapPartitionedPage: partitioned twin of reapPage. Same
    live-vs-orphan classification (parsed.SendTimestampMs >
    cutoff under the live gen, unconditional reap under older
    gens, defensive skip on parsed.Generation > currentGen) but
    parses each entry with parseSqsPartitionedMsgByAgeKey and
    routes the dispatch through reapOneRecordPartitioned.

  - classifyPartitionedByAgeEntry: helper extracted from
    reapPartitionedPage so the loop body stays under the cyclop
    ceiling. Returns (parsedKey, reapable bool).

  - reapExpiredDedup: now takes *sqsQueueMeta and routes by
    PartitionCount. Legacy meta (PartitionCount <= 1) →
    reapExpiredDedupLegacy (byte-identical to pre-PR-6b walk).
    Partitioned meta (PartitionCount > 1) →
    reapExpiredDedupPartitioned (NEW), which iterates each
    partition's partitioned dedup prefix under its own
    per-partition budget and uses the existing reapDedupPage to
    apply the value-based ExpiresAtMillis filter.

Caller audit:

  - reapQueue: one production caller (reapAllQueues line 85).
    No signature change. Behaviour for non-partitioned queues
    byte-identical; partitioned queues get the additional
    per-partition pass.
  - reapExpiredDedup: signature changed to take *sqsQueueMeta.
    One production caller (reapAllQueues line 88), updated. No
    test files invoked this helper directly. Legacy meta
    routes to the byte-identical legacy walk; partitioned routes
    to the new walk.
  - reapQueueLegacy / reapQueuePartition /
    reapExpiredDedupLegacy / reapExpiredDedupPartitioned /
    classifyPartitionedByAgeEntry: each has exactly one caller in
    the new live-queue reap path.
  - reapOneRecordPartitioned: existing helper from PR 6a.
    Previous caller was reapDeadByAgePartitionPage (tombstone
    path); now also called from reapPartitionedPage (live-queue
    path). Same dispatch semantics — synthetic meta carrying
    PartitionCount > 1 to flip the dispatch helper branch.

Tests:

  - New TestSQSServer_PartitionedFIFO_LiveQueueDedupReaperPartitions
    (wire-level): create a 4-partition FIFO queue, send across 6
    distinct groups, backdate every partitioned dedup record's
    ExpiresAtMillis, run reapAllQueues, assert every partitioned
    dedup row across [0, 4) is gone. Pre-PR-6b reaper would have
    left every row in place — the test fails on the legacy code
    path.

Self-review (CLAUDE.md):

  1. Data loss — Closes the live-queue dedup leak; closes the
     partitioned retention-expired-message leak. Legacy queues
     unchanged: reapQueue, reapExpiredDedup, byage walks, and
     dispatch helpers all keep their byte-identical pre-PR-6b
     paths for PartitionCount <= 1.
  2. Concurrency / distributed failures — Reaper still runs only
     on the leader. Each partition's pass is sequential; per-
     partition budget bounds the pass. Existing OCC dispatch
     semantics on each per-record reap unchanged.
  3. Performance — Per-tick partitioned-queue cost grows from
     O(1 walk) to O(partition_count walks) on byage AND dedup.
     Each partition bounded by sqsReaperPerQueueBudget. 30s tick
     interval comfortably absorbs 32-partition × per-queue
     budget per design.
  4. Data consistency — Live-vs-orphan classification on
     partitioned byage mirrors the legacy branch exactly
     (reapPage / reapPartitionedPage share the rules through
     classifyPartitionedByAgeEntry). PartitionCount immutability
     means the meta-driven iteration bound matches the on-disk
     keys for any cohort.
  5. Test coverage — One new wire-level integration test for
     the partitioned dedup walk; the partitioned byage walk
     reuses parsing / dispatch helpers already tested by PR 6a's
     tombstone-reap integration test.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 4, 2026

@claude review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 15 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ab9f63a-cbb1-49da-a43a-f3f9a5c3b093

📥 Commits

Reviewing files that changed from the base of the PR and between 80f1c86 and 996df49.

📒 Files selected for processing (2)
  • adapter/sqs_partitioned_dispatch_test.go
  • adapter/sqs_reaper.go
📝 Walkthrough

Walkthrough

The pull request extends the SQS reaper to handle partitioned queues by splitting both byage and dedup expiry reaping logic into legacy and partition-aware paths. Queue metadata is threaded through the reaper pipeline to enable partition iteration and per-partition budget management. A comprehensive integration test validates that dedup reaping correctly removes expired dedup records across all partitions.

Changes

Partition-Aware Reaper Implementation

Layer / File(s) Summary
Metadata Wiring
adapter/sqs_reaper.go (lines 85-90)
reapAllQueues now passes queue metadata into reapExpiredDedup to enable partition-aware dispatch.
Byage Reaping Refactor
adapter/sqs_reaper.go (lines 425-539)
reapQueue is restructured to run legacy byage sweep unconditionally, then conditionally invoke reapQueuePartition for each partition when meta.PartitionCount > 1. New reapQueueLegacy and reapQueuePartition functions isolate legacy vs partitioned scan logic with per-partition budget handling.
Partitioned Byage Classification
adapter/sqs_reaper.go (lines 541-591)
New reapPartitionedPage loop and classifyPartitionedByAgeEntry helper enforce partition/generation matching and cutoff gating to classify records as reapable, dispatching eligible entries via reapOneRecordPartitioned.
Dedup Expiry Refactor
adapter/sqs_reaper.go (lines 728-829)
reapExpiredDedup is updated to accept metadata and dispatch between reapExpiredDedupLegacy (legacy prefix) and reapExpiredDedupPartitioned (partitioned prefix, per-partition budget), preventing dedup leaks on partitioned queues.
Integration Test
adapter/sqs_partitioned_dispatch_test.go (lines 4, 653-754)
New TestSQSServer_PartitionedFIFO_LiveQueueDedupReaperPartitions creates a partitioned FIFO queue, generates partitioned dedup records, backdates ExpiresAtMillis via transactional writes (with bytes.Clone), runs the reaper, and asserts zero dedup rows remain across all partitions. Helper countPartitionedDedupRowsAcrossPartitions sums dedup rows by scanning each partition's dedup prefix.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 The reaper hops through partitioned lands,
Sweeping dedup dust across the strands,
Metadata whispers which way to go,
Legacy paths and partitions flow,
Expired records tumble down below! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: adding partition-aware reaper logic for live queues. It is specific, concise, and clearly conveys the primary intent.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-htfifo-reaper-livequeue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 22 minutes and 15 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Claude finished @bootjp's task in 3m 30s —— View job


Review

  • Gather context and understand request
  • Read adapter/sqs_reaper.go changes
  • Read adapter/sqs_partitioned_dispatch_test.go changes
  • Review related reaper infrastructure and key helpers
  • Analyze correctness, concurrency, edge cases, and test coverage
  • Post review feedback

The PR correctly closes the live-queue half of the partitioned-FIFO reaper gap. The architecture is coherent — reapQueuePartition + reapPartitionedPage handle the byage path, reapExpiredDedupPartitioned handles the dedup path, and both respect the per-partition budget contract. The key encoding analysis checks out against sqsPartitionedMsgByAgeKey / sqsPartitionedMsgDedupKey.

Two items need addressing before merge; one stylistic observation below them.


Issue 1 — Stale comment in reapPage (lines 622–625) [must fix]

// Live-queue retention reap currently iterates only the
// legacy byage keyspace; partitioned-byage live-queue
// retention is a follow-up to PR 6a (the tombstoned-cohort
// path is what this PR addresses).

This comment was written in PR 6a as a forward-reference to PR 6b. PR 6b is now this PR — so the claim "is a follow-up to PR 6a" is misleading: it implies the work is still outstanding when reapQueuePartition has just landed it. The comment should be trimmed to note that reapPage covers the legacy path only (the partitioned live-queue path is handled by reapPartitionedPage).

Fix this →


Issue 2 — classifyPartitionedByAgeEntry retention logic is untested [should fix]

The PR description says the partitioned byage walk is "covered by PR 6a's tombstone-reap integration test," but that test exercises reapDeadByAgePartitionPage — which reaps unconditionally (every record under a tombstoned gen is an orphan, no cutoff check). The new live-queue classification in classifyPartitionedByAgeEntry adds two distinct code paths that have no test:

  1. Retention cutoff gateparsed.SendTimestampMs > cutoff for the live gen (reap only when retention has elapsed).
  2. Future-gen guardparsed.Generation > currentGen (skip and wait for meta to catch up).

A test analogous to the new dedup test would be: send messages to a 4-partition queue, backdate their SendTimestampMs beyond the retention window (or lower MessageRetentionSeconds to 0 and verify orphan gen reap), run reapAllQueues, and assert the partitioned byage / data / vis rows are gone. The cutoff path and the orphan-gen path each deserve a sub-case, since classifyPartitionedByAgeEntry is the only place where the live-vs-orphan decision is made for partitioned queues.

Fix this →%20are%20gone.%20The%20test%20pins%20the%20classifyPartitionedByAgeEntry%20cutoff%20branch%20(SendTimestampMs%20%3C%3D%20cutoff)%20which%20has%20no%20coverage%20today.&repo=bootjp/elastickv)


Observation — Inline prefix construction duplicated in two places (minor)

Both reapQueuePartition (line 508–513) and reapExpiredDedupPartitioned (line 797–800) inline a "partition-scoped but no-gen" prefix that no existing key helper covers:

prefix = append(prefix, SqsPartititioned*Prefix...)
prefix = append(prefix, encodeSQSSegment(queueName)...)
prefix = append(prefix, sqsPartitionedQueueTerminator)
prefix = appendU32(prefix, partition)

This isn't a bug — both inlinings are correct, and the existing helpers either cover all-partitions-all-gens (sqsPartitionedMsgByAgePrefixForQueueAllPartitions) or one-partition-one-gen (sqsPartitionedMsgByAgePrefixForPartition). Extracting helpers for the missing "one-partition-all-gens" variant would make future readers less likely to add a gen accidentally, but that's a judgment call on the right trade-off between helper proliferation and readability.


Correctness notes (no action needed)

  • Key prefix semantics: Confirmed. The partition-scoped prefix for byage (SqsPartitionedMsgByAgePrefix|queue|'|'|u32(partition)) correctly underlaps every generation for that partition, matching the intent of reapPartitionedPage's per-entry gen check.
  • Dedup prefix semantics: Confirmed. Same pattern — partition-scoped prefix (SqsPartitionedMsgDedupPrefix|queue|'|'|u32(partition)) covers all gens; reapDedupPage filters by ExpiresAtMillis, not gen.
  • cutoff = 0 when MessageRetentionSeconds ≤ 0: Correct. SendTimestampMs > 0 is true for all real messages, so setting cutoff to 0 (rather than MaxInt64) is an effective no-reap sentinel for the live gen, mirroring the legacy path.
  • reapOneRecordPartitioned sentinel value (PartitionCount: 2): The wrapper correctly uses the minimum legal partitioned value — the dispatch helpers only branch on PartitionCount > 1, not on the exact count.
  • Budget accounting: processed resets per partition in both new functions — consistent with the stated per-partition budget contract.
  • Test fixture countPartitionedDedupRowsAcrossPartitions: Correctly uses gen=1 (the initial gen after installPartitionedMetaForTest) to scope the scan.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements partitioned reaping for SQS queues, updating the reaper to handle both legacy and partitioned keyspaces for message expiration and deduplication record cleanup. A new test case verifies that expired deduplication records are correctly reaped from partitioned FIFO queues. The reviewer suggested that reapExpiredDedup should perform both legacy and partitioned scans for partitioned queues to maintain consistency with reapQueue and prevent potential record leaks.

Comment thread adapter/sqs_reaper.go Outdated
Comment on lines +738 to +741
if meta != nil && meta.PartitionCount > 1 {
return s.reapExpiredDedupPartitioned(ctx, queueName, meta.PartitionCount, readTS)
}
return s.reapExpiredDedupLegacy(ctx, queueName, readTS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reapExpiredDedup function uses an if/else logic to choose between legacy and partitioned dedup reaping. In contrast, reapQueue (line 433-451) performs both legacy and partitioned scans for partitioned queues to be defensive against leaked records. To maintain consistency and ensure no records are leaked if a queue's state were ever inconsistent, consider performing both scans here as well when meta.PartitionCount > 1.

if meta != nil && meta.PartitionCount > 1 {
	if err := s.reapExpiredDedupPartitioned(ctx, queueName, meta.PartitionCount, readTS); err != nil {
		return err
	}
}
return s.reapExpiredDedupLegacy(ctx, queueName, readTS)

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
adapter/sqs_partitioned_dispatch_test.go (1)

715-723: 💤 Low value

Consider adding ReadKeys for OCC consistency.

The backdate write lacks ReadKeys, which differs from the pattern used elsewhere in this file (e.g., lines 58-65). While this test runs in isolation and conflicts are unlikely, including the key in ReadKeys would match the established OCC transaction pattern.

Suggested fix
 req := &kv.OperationGroup[kv.OP]{
     IsTxn:   true,
     StartTS: readTS,
+    ReadKeys: [][]byte{bytes.Clone(row.Key)},
     Elems: []*kv.Elem[kv.OP]{
         {Op: kv.Put, Key: bytes.Clone(row.Key), Value: body},
     },
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/sqs_partitioned_dispatch_test.go` around lines 715 - 723, The txn
request `req` created as a *kv.OperationGroup[kv.OP]* for the backdated write is
missing the OCC `ReadKeys` field; update the request construction (the `req`
passed to `srv.coordinator.Dispatch`) to include the transaction's read key(s)
(the same key used in the Put element) in `ReadKeys` so the operation follows
the established OCC pattern used elsewhere in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapter/sqs_partitioned_dispatch_test.go`:
- Around line 715-723: The txn request `req` created as a
*kv.OperationGroup[kv.OP]* for the backdated write is missing the OCC `ReadKeys`
field; update the request construction (the `req` passed to
`srv.coordinator.Dispatch`) to include the transaction's read key(s) (the same
key used in the Put element) in `ReadKeys` so the operation follows the
established OCC pattern used elsewhere in this file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9072fae-a2cd-465c-8f46-dd99affc49eb

📥 Commits

Reviewing files that changed from the base of the PR and between f489669 and 80f1c86.

📒 Files selected for processing (2)
  • adapter/sqs_partitioned_dispatch_test.go
  • adapter/sqs_reaper.go

Two findings from Claude on PR #736 plus a medium-severity Gemini
suggestion in the same review pass:

1. (Claude must-fix) Stale forward-reference comment in reapPage
   was written in PR 6a pointing at PR 6b. PR 6b is now this branch,
   so the "is a follow-up to PR 6a" wording was misleading. Trimmed
   to a one-line note that reapPage covers the legacy keyspace and
   reapPartitionedPage is the partitioned twin.

2. (Claude should-fix) classifyPartitionedByAgeEntry had no direct
   coverage of its retention-cutoff branch (live gen, sendTs <=
   cutoff -> reapable) or its future-gen guard (parsed.gen >
   currentGen -> not reapable). The integration tests exercise the
   orphan-cohort and dedup-expiry paths only. Added
   TestClassifyPartitionedByAgeEntry: a table-driven unit test that
   pins every branch (within retention, past retention, exact-cutoff
   boundary, orphan generation, future generation, wrong partition,
   wrong queue prefix, legacy key) against deterministic key inputs
   built via sqsPartitionedMsgByAgeKey. No store / coordinator
   dependency, so the test is fast and immune to retention timing.

3. (Gemini medium) reapExpiredDedup used an if/else to pick between
   legacy and partitioned dedup reaping for partitioned queues,
   while reapQueue runs both for symmetry and defensive coverage of
   leaked legacy entries. Mirrored that policy: legacy scan now
   always runs (cheap on an empty prefix today), and the partitioned
   scan additionally runs for partitioned queues. Caller audit:
   reapExpiredDedup has a single caller (reapAllQueues) and the
   error contract is unchanged, so no semantic ripple.

Refs: PR #736 review thread; Claude review at run 25327783108.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 4, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Claude finished @bootjp's task in 2m 58s —— View job


Review

  • Gather context
  • Read adapter/sqs_reaper.go changes
  • Read adapter/sqs_partitioned_dispatch_test.go changes
  • Analyze correctness, concurrency, and edge cases
  • Post review feedback

This is the round-2 pass. All items from the previous Claude review and the Gemini concern are resolved.

Resolved since round 1

Stale comment in reapPage — fixed. Now reads: "reapPage covers the legacy byage keyspace only; the partitioned twin is reapPartitionedPage."

classifyPartitionedByAgeEntry untested — fixed. TestClassifyPartitionedByAgeEntry covers all 8 branches: live-within-window, live-past-window, at-cutoff boundary, orphan gen, future-gen guard, wrong partition, wrong queue, legacy key.

reapExpiredDedup if/else vs dual-scan (Gemini) — fixed. Now always runs the legacy scan first, then conditionally runs the partitioned scan, matching reapQueue's defensive both-scan policy. The doc comment explains the rationale.

One remaining nit (no blocker)

ReadKeys missing in the test's backdate transaction (sqs_partitioned_dispatch_test.go lines 715–723). This is test-only code that runs without concurrency, so there is no correctness risk — CodeRabbit flagged it as a style inconsistency. Fix is optional and does not block merge.

Fix this →

Correctness notes (no action needed)

  • cutoff = 0 when MessageRetentionSeconds <= 0: The inline comment says "MaxInt64-ish" which is misleading — the value of 0 is what is correct (SendTimestampMs > 0 is true for all real messages, making them non-reapable in the live gen). Pre-existing, not introduced here.
  • now computed separately in each dedup function: Negligible clock skew between the legacy and partitioned calls. No correctness issue.
  • Budget accounting: processed correctly resets per partition in both reapQueuePartition and reapExpiredDedupPartitioned — per-partition budget contract correctly implemented.
  • break vs return nil in reapExpiredDedupPartitioned: The inner loop uses break (not return nil) so the outer partition loop continues to the next partition after a budget-hit or empty page. Correct; the legacy function's return nil is fine because there is no outer loop there.
  • Key prefix semantics: Confirmed. Partition-only prefixes (no gen) correctly underspan every generation for a given partition, matching the per-entry gen filtering in classifyPartitionedByAgeEntry and reapDedupPage.

The architecture is sound and test coverage is now complete for all new decision paths. Ready to merge pending CI.

bootjp added a commit that referenced this pull request May 4, 2026
## Summary

Phase 3.D PR 7a — adds the
`elastickv_sqs_partition_messages_total{queue, partition, action}`
Prometheus counter so dashboards and alerts can spot uneven
`MessageGroupId` distributions across partitioned-FIFO queues.
Metrics-only: the Jepsen HT-FIFO workload (PR 7b) ships separately so
the two review loops do not cross.

- `monitoring/sqs.go` (new): `SQSMetrics` with cardinality cap
(`sqsMaxTrackedQueues = 512`, overflow collapses to `_other`) mirroring
`DynamoDBMetrics`. Drops empty queue names and unknown action labels so
a future call-site bug cannot pollute the series space dashboards have
to learn about. Nil-receiver-safe so adapter call sites do not need to
nil-guard.
- `adapter/sqs.go`: `SQSPartitionObserver` interface +
`WithSQSPartitionObserver` option. Re-declared in `adapter` so it
doesn't import `monitoring` at the package boundary (matches the
DynamoDB/Redis observer pattern). Action constants
(`send`/`receive`/`delete`) re-declared on the adapter side and
validated at runtime by the monitoring side — drift between the two
surfaces as a dropped observation, not a wedge.
- `adapter/sqs_fifo.go`, `adapter/sqs_messages.go`: emit the counter on
the **partitioned** commit branch only (`PartitionCount > 1`) for send /
receive / delete. Legacy single-partition queues stay off the metric
since partition is always 0 and the cardinality cost would buy nothing.
- `monitoring/registry.go`, `main_sqs.go`, `main.go`: wire the
registry's `SQSPartitionObserver()` into `startSQSServer` so the SQS
server picks up the production observer on cluster boot. Test fixtures
and CLI tools that build `SQSServer` without a registry pass `nil` and
the metric stays at zero.

## Tests

`monitoring/sqs_test.go` (new, 6 cases):

- `TestSQSMetrics_ObservePartitionMessage_IncrementsByLabelTriple` — pin
the `(queue, partition, action)` counter contract.
- `TestSQSMetrics_ObservePartitionMessage_DropsInvalidAction` — pin the
typo guard against future drift between adapter and monitoring
constants.
- `TestSQSMetrics_ObservePartitionMessage_DropsEmptyQueue` — pin that an
empty queue name does not collapse with valid observations onto a shared
series.
- `TestSQSMetrics_NilReceiverIsSafe` — pin the nil-receiver
short-circuit the adapter relies on.
- `TestSQSMetrics_QueueLabelOverflow` — pin the cap-and-collapse so a
misbehaving caller cannot exhaust the Prometheus series budget.
- `TestSQSMetrics_RegistryWiring` — pin that the public `Registry`
exposes the metric under the documented name.

## Self-review (5 lenses)

1. **Data loss** — N/A; metrics-only, no storage / Raft / FSM touch.
2. **Concurrency** — counter increments are atomic via Prometheus; the
`trackedQueues` map is only consulted from the dispatch-success path
under the SQS server's existing concurrency model. No new locks.
3. **Performance** — one map lookup + one `CounterVec` lookup per
partitioned send/receive/delete on the success branch. Legacy queues
skip the call entirely. Cardinality bounded at 512 queue × 32 partition
(`htfifoMaxPartitions`) × 3 action ≈ 49k series worst case; in practice
a 32-partition queue yields 96 series, so the budget is plenty for the
SLO panels.
4. **Data consistency** — the metric is observed AFTER OCC dispatch
succeeds, so the counter reflects committed state. Receive/delete
branches that return on retryable errors deliberately do not increment
(the retry path will observe on the eventual success).
5. **Test coverage** — 6 unit tests in `monitoring/`, plus the
adapter-side nil-observer path is exercised by all existing
partitioned-FIFO tests in `adapter/sqs_partitioned_dispatch_test.go`
(they pass `nil` observer through the test fixture).

## Test plan

- [x] `go test -race -count=1 ./monitoring/...`
- [x] `go test -race -count=1 -run 'TestSQS' ./adapter/...`
- [x] `go test -race -count=1 ./...` (full suite)
- [x] `golangci-lint --config=.golangci.yaml run ./...` (full repo)
- [ ] Jepsen HT-FIFO workload — deferred to PR 7b

## Refs

- `docs/design/2026_05_01_partial_split_queue_fifo.md` §11 PR 7
- Builds on PR 5b-3 (#734) capability gate, PR 6a (#735) tombstone
reaper, PR 6b (#736 in flight) live-queue reaper.
@bootjp bootjp merged commit 675adc3 into main May 4, 2026
9 checks passed
@bootjp bootjp deleted the feat/sqs-htfifo-reaper-livequeue branch May 4, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant