Skip to content

Support per-backend role and destination in replication engine#2743

Open
maeldonn wants to merge 3 commits into
development/9.5from
improvement/BB-762/crr-multi
Open

Support per-backend role and destination in replication engine#2743
maeldonn wants to merge 3 commits into
development/9.5from
improvement/BB-762/crr-multi

Conversation

@maeldonn

Copy link
Copy Markdown
Contributor

Read destination bucket and role from each backends[] entry instead of the shared top-level fields, so a single source object can be replicated to multiple CRR destinations with their own role. Legacy entries without per-backend fields keep working via top-level fallback in ObjectQueueEntry's site-aware getters; MongoQueueProcessor's oplog path now matches every applicable rule and dedups backends per the design's (site, destination, role) rule.

Issue: BB-762

@bert-e

bert-e commented May 27, 2026

Copy link
Copy Markdown
Contributor

Hello maeldonn,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e

bert-e commented May 27, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue BB-762 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.4.1

  • 9.5.0

Please check the Fix Version/s of BB-762, or the target
branch of this pull request.

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.37607% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.27%. Comparing base (410a972) to head (c7ab73a).

Files with missing lines Patch % Lines
...sions/replication/queueProcessor/QueueProcessor.js 48.64% 19 Missing ⚠️
...sions/replication/tasks/UpdateReplicationStatus.js 58.82% 7 Missing ⚠️
...xtensions/replication/tasks/MultipleBackendTask.js 33.33% 4 Missing ⚠️
lib/models/ObjectQueueEntry.js 80.95% 4 Missing ⚠️
extensions/replication/tasks/ReplicateObject.js 90.47% 2 Missing ⚠️
extensions/mongoProcessor/MongoQueueProcessor.js 90.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (68.37%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/api/BackbeatAPI.js 90.10% <100.00%> (ø)
lib/models/QueueEntry.js 73.52% <100.00%> (ø)
extensions/mongoProcessor/MongoQueueProcessor.js 66.37% <90.00%> (-3.04%) ⬇️
extensions/replication/tasks/ReplicateObject.js 88.50% <90.47%> (+0.04%) ⬆️
...xtensions/replication/tasks/MultipleBackendTask.js 48.18% <33.33%> (ø)
lib/models/ObjectQueueEntry.js 87.95% <80.95%> (-3.36%) ⬇️
...sions/replication/tasks/UpdateReplicationStatus.js 72.95% <58.82%> (-2.30%) ⬇️
...sions/replication/queueProcessor/QueueProcessor.js 70.55% <48.64%> (-2.10%) ⬇️

... and 2 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 78.56% <84.61%> (-0.58%) ⬇️
Ingestion 70.04% <90.00%> (-0.59%) ⬇️
Lifecycle 76.74% <ø> (ø)
Oplog Populator 82.00% <ø> (ø)
Replication 56.15% <60.49%> (-0.19%) ⬇️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.5    #2743      +/-   ##
===================================================
- Coverage            72.62%   72.27%   -0.36%     
===================================================
  Files                  202      202              
  Lines                13744    13764      +20     
===================================================
- Hits                  9982     9948      -34     
- Misses                3752     3806      +54     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.18% <16.23%> (+0.03%) ⬆️
api:routes 8.95% <0.00%> (-0.02%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.12% <0.00%> (-1.26%) ⬇️
ingestion 12.37% <13.67%> (-0.19%) ⬇️
lifecycle 18.88% <0.00%> (-0.02%) ⬇️
notification 1.02% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
replication 18.78% <56.41%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:310 — Rule matching via Rules.find() does not consider priority, while MongoQueueProcessor sorts by priority descending. If multiple rules target the same site with different Destination.Account values, the derived expectedDestRole may not match the entry's role, causing a spurious BadRole error.
    - tests/unit/lib/models/ObjectQueueEntry.spec.js — File shows as a binary diff in this PR, which typically indicates a BOM or encoding issue. Verify the file is clean UTF-8.

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 36b459a to 98c878f Compare May 28, 2026 15:17
@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown
  • Overall: well-structured refactor — per-backend roles/destinations are correctly propagated through MongoQueueProcessor, QueueProcessor, and the replication task chain. Legacy fallbacks tested. No critical issues found.
    - extensions/replication/tasks/ReplicateObject.js:270 — Minor: roles.length < 1 is unreachable because String.split(',') always returns an array with at least one element. Could simplify to roles.length > 2.
    - extensions/replication/tasks/MultipleBackendTask.js:59 — Note: entry.getReplicationRoles() is called without this.site, while the parent ReplicateObject._setupRolesOnce was updated to pass this.site. This appears intentional since cloud backends don't carry per-backend roles, but worth a comment or assertion to make the design choice explicit.

    LGTM

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@maeldonn maeldonn requested review from a team, SylvainSenechal and delthas May 29, 2026 08:57
@maeldonn maeldonn marked this pull request as ready for review May 29, 2026 08:57
Comment thread package.json Outdated
@claude

claude Bot commented May 29, 2026

Copy link
Copy Markdown
  • package.json:57 — arsenal pinned to raw commit SHA instead of a tag; should use the resolved version tag (8.4.2) for consistency with other git-based deps

    The rest of the PR looks solid: per-backend role/destination plumbing is consistent across MongoQueueProcessor, QueueProcessor, ReplicateObject, and MultipleBackendTask. Legacy fallback paths are well-tested. The deletion of getLocationsFromStorageClass is safe — no remaining callers in production code.

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from d5a834d to 83c75ef Compare June 1, 2026 11:52
Comment thread package.json Outdated
@claude

claude Bot commented Jun 1, 2026

Copy link
Copy Markdown
  • package.json:57 — Arsenal is pinned to a commit hash (39c1a64...) instead of a tag. The yarn.lock resolves to version 8.4.3, so please pin to the tag for consistency with other git-based deps.
    - extensions/replication/tasks/MultipleBackendTask.js:59 — _setupRolesOnce calls getReplicationRoles() without this.site, unlike the parent ReplicateObject._setupRolesOnce which was updated to pass it. This works today because cloud backends don't carry per-backend roles, but consider passing this.site for forward-compatibility with multi-destination CRR on cloud targets.

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
@claude

claude Bot commented Jun 1, 2026

Copy link
Copy Markdown
  • extensions/replication/tasks/ReplicateObject.js:298 — matchingRule lookup uses bare rule.Prefix but the replicationEnabled check above uses rule.Filter?.Prefix ?? rule.Prefix ?? ''. V2 rules with Filter.Prefix will never match, silently falling back to the legacy destination role instead of the per-account substituted role.
    - package.json:57 — Arsenal pinned to a commit hash instead of a tag (yarn.lock resolves to 8.4.3).

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 3426739 to 4791b8b Compare June 4, 2026 14:26
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown
  • extensions/replication/tasks/ReplicateObject.js:298 — Bug: matchingRule lookup uses rule.Prefix directly but should use rule.Filter?.Prefix ?? rule.Prefix ?? '' like the replicationEnabled check above it. V2 rules with Filter.Prefix will never match, causing the Account-based role resolution to silently fall through to the legacy comparison and reject valid entries with BadRole.
    - package.json:57 — Arsenal pinned to commit hash instead of a tag (resolved version 8.4.3).

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 4791b8b to 14ef8f1 Compare June 4, 2026 17:13
Comment thread extensions/replication/queueProcessor/QueueProcessor.js Outdated
Comment thread package.json Outdated
@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

PR Review Summary

  • QueueProcessor.js:912 — Bug: this.replicationBackends is undefined (never set on the instance). Cloud backends will always use ReplicateObject instead of MultipleBackendTask. Should use the module-level replicationBackends constant imported at line 42.
  • package.json:57 — Arsenal pinned to a bare commit hash instead of a version tag, which violates project conventions for git-based deps.

Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 14ef8f1 to cfdd8fb Compare June 5, 2026 15:23
Comment thread package.json Outdated
@eve-ci-cd eve-ci-cd Bot temporarily deployed to zenko/improvement/ZENKO-5253/multi-crr@2.15 June 9, 2026 09:52 Destroyed
@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 00094e7 to 0b2138b Compare June 10, 2026 16:23
Comment thread package.json Outdated
Comment thread extensions/replication/queueProcessor/QueueProcessor.js Outdated
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

PR Review Summary

  • package.json:57 — Arsenal dependency pinned to a bare commit hash (724d062...) instead of a version tag. Git-based deps should pin to a tag per project conventions.
  • QueueProcessor.js:908-909 — Inconsistent indentation (14 spaces instead of 12) inside the pendingBackends.length === 0 block.

Observations (non-blocking):

  • The top-level destination, storageClass, storageType, and dataStoreVersionId fields are no longer set in the replication info by MongoQueueProcessor._setEntryReplicationInfo. This is intentional (these are now per-backend), but worth documenting as a breaking change to the replication info format in case other services or tooling read these top-level fields.
  • The processReplicationEntry async migration with callbackify looks correct — error propagation, skip semantics, and sequential backend processing are all handled properly.
  • The clone() method doesn't preserve destination and role (same as site), but all callers correctly re-set them after cloning. Consistent pattern.
  • Good test coverage for the new multi-destination disambiguation (ObjectQueueEntry.spec.js, QueueProcessor.spec.js, MongoQueueProcessor.spec.js).

Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js
Comment thread extensions/replication/tasks/ReplicateObject.js
@maeldonn maeldonn requested a review from SylvainSenechal June 11, 2026 13:00
Comment thread package.json Outdated
Comment thread lib/models/ObjectQueueEntry.js
Comment thread extensions/mongoProcessor/MongoQueueProcessor.js
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
  • package.json:57 — Arsenal pinned to commit hash instead of a tag. Git-based deps should be pinned to tags per project guidelines.
    - lib/models/ObjectQueueEntry.js:284-288 — Breaking change: toKafkaEntry now serialises destination and role into the Kafka message payload. Rolling upgrade ordering must be documented.
    - extensions/mongoProcessor/MongoQueueProcessor.js:345-351 — Breaking change: setReplicationInfo no longer sets destination, storageClass, or storageType at the top level. Old readers will break on entries written by new code.

    Review by Claude Code

@delthas delthas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the surface, but def need someone with more hands-on exp with CRR to review in depth.

Also I found it hard to review (+2600, -600 on 20~ files), and the commit split did not help much. Wondering if perhaps future MRs could have smaller, atomic commits. Here there's a big commit and a test commit and then 2 review fixups commits (which I personally find make it even harder to review cause we have to go "oh there's an issue here" remember then "oh it's fixed in that upcoming commit" but we acted in retro that we want no squashes so OK).

Comment thread lib/models/ObjectQueueEntry.js
Comment thread tests/unit/lib/models/ObjectQueueEntry.spec.js
assert.strictEqual(entry.getReplicationTargetBucket({ site: 'siteB' }), 'bucket-b');
});

it('falls back to top-level role string when backend has none', () => {

@SylvainSenechal SylvainSenechal Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit weird to says "falls back to top-level", when _makeEntryWithBackends is called with no topLevel parameter. I think part of it is me not understanding the test, I guess test is fine but its just naming or structure maybe

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Top-level" refers to replicationInfo.destination / .role (what per-backend entries override). The helper was hiding those behind defaults. I've hoisted them into TOP_LEVEL_DEST / TOP_LEVEL_ROLE constants so the fallback assertions reference them directly. Kept the test names since the wording matches the actual replicationInfo shape.

@SylvainSenechal SylvainSenechal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to comment, it's really nice that you were able to remove callbacks and turn some functions asynchronous.
This is hard to review, I think it's ok, I left 1 or 2 comments in Arsenal as I was reviewing here, but they are also not very important comments.
You already know but we will have to test this properly, what worries me the most is not people setting up brand new replication from this new branch, but breaking existing replication flow

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from ec8baf8 to 5df0134 Compare June 15, 2026 14:10
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

LGTM — well-structured multi-destination support with clean backward-compatible design.

Highlights:

  • The async/await migration of processReplicationEntry with callbackify bridge is correct
  • Sequential per-backend dispatch with first-error-rethrow is sound: the PENDING filter on retry ensures only unfinished backends are re-attempted
  • clone() now properly copies workflowId/workflowVersion/nodeId/uniqueId/ignore, fixing a pre-existing data loss on clone
  • ObjectQueueEntry constructor opts-object refactor is a clean API improvement
  • Backward compatibility is maintained: per-backend getters fall back to top-level fields when backend has no destination/role, and Kafka deserialization handles old entries without destination/role gracefully
  • Test coverage is thorough across ObjectQueueEntry, QueueProcessor fan-out, ReplicateObject role validation, MultipleBackendTask per-backend routing, MongoQueueProcessor, and UpdateReplicationStatus multi-destination

No issues found.

Review by Claude Code

maeldonn added 3 commits June 15, 2026 18:23
Read destination bucket and role from each backends[] entry instead of
the shared top-level fields, so a single source object can be replicated
to multiple CRR destinations with their own role. Legacy entries
without per-backend fields keep working via top-level fallback in
ObjectQueueEntry's site-aware getters; MongoQueueProcessor's oplog path
now matches every applicable rule and dedups backends per the design's
(site, destination, role) rule.

Issue: BB-762
Add unit tests covering pre-existing branches in code paths that
BB-762 restructured

Issue: BB-762
@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 5df0134 to c7ab73a Compare June 15, 2026 16:23
@maeldonn maeldonn changed the base branch from development/9.4 to development/9.5 June 15, 2026 16:23
@bert-e

bert-e commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue BB-762 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.5.0

Please check the Fix Version/s of BB-762, or the target
branch of this pull request.

Comment thread lib/api/BackbeatAPI.js
new ObjectQueueEntry(bucket, objectKey, mdObj);
queueEntry.setSite(entry.getSite());
const queueEntry = new ObjectQueueEntry(bucket, objectKey, mdObj)
.setReplicationBackend({ site: entry.getSite() });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With multi-destination support, two backends can share the same site but differ in destination and role. Here only site is stamped on the refreshed entry, so getReplicationBackend() returns { site, destination: undefined, role: undefined }.

Downstream, _pushToCRRRetryKafkaTopics calls toRetryEntry(backend) which uses _findBackend(backend) — with destination and role both undefined, it will match the first backend for this site, potentially retrying the wrong one in a multi-destination setup.

Consider propagating destination and role from the API request (or from the entry's metadata backends) so the retry path can disambiguate.

— Claude Code

if (roles.length !== 2) {
log.error('expecting two roles separated by a ' +
'comma in bucket replication configuration',
if (roles.length > 2) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check was relaxed from roles.length !== 2 to roles.length > 2, which now allows single-role bucket configs (roles.length === 1). When no matching rule with Account is found (line 314), the fallback sets expectedDestRole = roles[1] which would be undefined for a single-role config. Since the entry always has exactly two roles (validated at line 236), entryRoles[1] would be defined, so undefined !== entryRoles[1] would always evaluate to true and reject with BadRole.

If single-role bucket configurations are intended to be supported (with per-rule Account-based role derivation), this fallback path should handle the roles.length === 1 case explicitly — e.g. skip the literal comparison when roles[1] is undefined and matchingRule wasn't found, or error clearly instead of falling through to a misleading BadRole.

— Claude Code

status: 'PENDING',
backends,
content,
role: ReplicationConfiguration.resolveSourceRole(bucketRepInfo.role),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setReplicationInfo no longer sets top-level destination, storageClass, or storageType — those now live per-backend. Since replicationInfo is part of the persisted object metadata (written to MongoDB), any external consumers of this metadata (cloudserver S3 HEAD responses, other services) that read these top-level fields will see defaults/undefined after this change.

Worth confirming that the arsenal ObjectMD upgrade (8.4.7) and cloudserver handle the missing top-level fields gracefully, and that rolling upgrades between the old and new metadata format won't cause issues.

— Claude Code

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Review Summary

Well-structured PR that adds per-backend role and destination support for multi-destination CRR. The core refactor — moving from site-string routing to {site, destination, role} backend keys — is applied consistently across the replication engine (QueueProcessor, ReplicateObject, MultipleBackendTask, UpdateReplicationStatus, MongoQueueProcessor, BackbeatAPI). Test coverage is thorough, including multi-destination disambiguation, legacy fallback, and edge cases.

Findings:

  • lib/api/BackbeatAPI.js:434 — API retry path stamps only site on the refreshed entry (no destination/role). For multi-destination setups, toRetryEntry will match the first backend for that site, potentially retrying the wrong one.
  • extensions/replication/tasks/ReplicateObject.js:271 — The roles validation relaxed from !== 2 to > 2 allows single-role bucket configs, but the fallback roles[1] at line 315 would be undefined, always triggering BadRole for single-role configs without per-rule Account fields.
  • extensions/mongoProcessor/MongoQueueProcessor.js:349setReplicationInfo no longer populates top-level destination, storageClass, storageType. This is a metadata format change that may affect external consumers or rolling upgrades.

Positive notes:

  • The processReplicationEntry callback-to-async migration with callbackify wrapping is clean and improves error handling (synchronous exceptions are now properly captured via the async wrapper).
  • The sequential backend processing loop with first-error capture is a sound design — it ensures all backends are attempted and only PENDING ones are re-attempted on retry.
  • The _findBackend disambiguation logic handles both legacy site-only lookups and full {site, destination, role} matches, maintaining backward compatibility.
  • New unit tests for MongoQueueProcessor, QueueProcessor fan-out, ReplicateObject role validation, MultipleBackendTask per-backend routing, and UpdateReplicationStatus multi-destination scenarios are comprehensive.

Review by Claude Code

@bert-e

bert-e commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants