Skip to content

fix(consensus/XDPoS): fix results ordering and false unknown ancestor error in VerifyHeaders, XFN-12#2139

Merged
AnilChinchawale merged 3 commits intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order
Mar 23, 2026
Merged

fix(consensus/XDPoS): fix results ordering and false unknown ancestor error in VerifyHeaders, XFN-12#2139
AnilChinchawale merged 3 commits intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order

Conversation

@gzliudan
Copy link
Collaborator

@gzliudan gzliudan commented Mar 6, 2026

Proposed changes

fix:

  • audit issue XFN-12: Broken Result Ordering in VerifyHeaders
  • false unknown ancestor error during sync

Problems:

  • When a VerifyHeaders batch crosses the v1/v2 boundary, EngineV1 and EngineV2 can write to the same results channel concurrently. The emitted order then depends on goroutine scheduling instead of a deterministic order expected by callers.
  • In mixed v1/v2 VerifyHeaders batches, the first v2 header can fail with unknown ancestor when its v1 parent exists only in the same input batch (not yet persisted), or when hash-based parent lookup is masked.
  • Batch header verification could return false unknown-ancestor errors when parent headers were present in the same in-memory batch but not yet persisted. This could propagate to downloader sync as invalid-chain failures and trigger peer drops.

Solution:

  • Split v1 and v2 verification outputs into separate buffered channels, then forward them to the public results channel in a fixed sequence (all v1 results first, then all v2 results). Keep the single-engine fast paths unchanged.
  • Wrap mixed-path verification with verifyChainReader. The wrapper resolves GetHeader, GetHeaderByHash, GetHeaderByNumber, and GetBlock from in-batch headers first, then falls back to the underlying chain reader.
  • Use verifyChainReader for all XDPoS VerifyHeaders paths (v1-only, v2-only, and mixed) so in-batch ancestors are visible consistently.

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

Copilot AI review requested due to automatic review settings March 6, 2026 11:08
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c175f2c-310c-42f0-8073-ca375afc3d13

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a result-ordering bug in XDPoS.VerifyHeaders where splitting headers into v1/v2 buckets and running two concurrent goroutines caused nondeterministic result-to-header mapping around the v1→v2 switch boundary. This led to misleading "BAD BLOCK" log messages (e.g., a v1-style error attributed to a v2-height header) and sync failures (issue #2138).

Changes:

  • consensus/XDPoS/XDPoS.go: Replaced the two-bucket/two-goroutine VerifyHeaders implementation with a single goroutine that iterates headers in input order, dispatching each to the appropriate engine version.
  • consensus/tests/engine_v2_tests/adaptor_test.go: Added TestAdaptorVerifyHeadersKeepsInputOrderAcrossConsensusSwitch to assert that results arrive in the same order as the input slice across the consensus switch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
consensus/XDPoS/XDPoS.go Rewrites VerifyHeaders to a single sequential goroutine preserving input order
consensus/tests/engine_v2_tests/adaptor_test.go New regression test for result ordering across the v1/v2 switch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan force-pushed the fix-verify-order branch 4 times, most recently from 16837f0 to 41a76ec Compare March 7, 2026 13:43
@gzliudan gzliudan changed the title fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 2 times, most recently from a14f15c to eea3308 Compare March 10, 2026 07:01
@gzliudan gzliudan added the WIP work in process label Mar 11, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 3 times, most recently from b543a79 to 776237d Compare March 13, 2026 00:16
@gzliudan gzliudan changed the title fix(core,consensus/XDPoS): split header verification by consensus version, close XFN-12 fix(consensus/XDPoS): resolve mixed v1/v2 VerifyHeaders ancestor lookup, fix XFN-12 Mar 16, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): resolve mixed v1/v2 VerifyHeaders ancestor lookup, fix XFN-12 fix(consensus/XDPoS): fix mixed v1/v2 VerifyHeaders ancestor lookup and result ordering, fix XFN-12 Mar 16, 2026
@gzliudan gzliudan requested a review from Copilot March 16, 2026 09:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix mixed v1/v2 VerifyHeaders ancestor lookup and result ordering, fix XFN-12 fix(consensus/XDPoS): fix unknown ancestor error and mixed v1/v2 VerifyHeaders result ordering, XFN-12 Mar 17, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix unknown ancestor error and mixed v1/v2 VerifyHeaders result ordering, XFN-12 fix(consensus/XDPoS): fix unknown ancestor error and VerifyHeaders result race, XFN-12 Mar 17, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 4 times, most recently from 19e32b4 to 831dc3c Compare March 17, 2026 06:55
@gzliudan gzliudan force-pushed the fix-verify-order branch 6 times, most recently from 83741ef to 987b1cf Compare March 21, 2026 09:22
…XFN-12

Problem:
When a VerifyHeaders batch crosses the v1/v2 boundary, EngineV1 and EngineV2 can write to the same results channel concurrently. The emitted order then depends on goroutine scheduling instead of a deterministic order expected by callers.

Solution:
Split v1 and v2 verification outputs into separate buffered channels, then forward them to the public results channel in a fixed sequence (all v1 results first, then all v2 results). Keep the single-engine fast paths unchanged.

Impact:
Mixed-version batches now produce stable and predictable result ordering, removing race-driven ordering nondeterminism. Behavior and performance characteristics for pure v1 or pure v2 batches remain unchanged.

Validation:
Added and passed mixed-boundary regression coverage to verify deterministic result ordering and to ensure no scheduling-dependent output order.
Problem:
In mixed v1/v2 VerifyHeaders batches, the first v2 header can fail with unknown ancestor when its v1 parent exists only in the same input batch (not yet persisted), or when hash-based parent lookup is masked.

Solution:
Wrap mixed-path verification with verifyChainReader. The wrapper resolves GetHeader, GetHeaderByHash, GetHeaderByNumber, and GetBlock from in-batch headers first, then falls back to the underlying chain reader.

Impact:
Mixed-version verification now has deterministic ancestor visibility across the v1->v2 boundary, eliminating false unknown ancestor failures. Pure v1 and pure v2 verification paths remain unchanged.

Validation:
Added and passed regression coverage for batch shadowing, in-batch parent resolution, nil-chain safety, and mixed VerifyHeaders result flow.
…tch sync

Problem:
Batch header verification could return false unknown-ancestor errors when parent headers were present in the same in-memory batch but not yet persisted. This could propagate to downloader sync as invalid-chain failures and trigger peer drops.

Solution:
- Use verifyChainReader for all XDPoS VerifyHeaders paths (v1-only, v2-only, and mixed) so in-batch ancestors are visible consistently.
- Add/extend regression coverage in engine_v2 tests for pure-v2 epoch-switch batch verification where parent headers are not yet written to DB.
- Add downloader black-box regression tests for both LightSync and FastSync to verify:
  - ancestor error path is classified as errInvalidChain and drops the peer
  - control path succeeds and keeps the peer.
- Rename paired downloader tests to short, symmetric names.

Validation:
- go test ./consensus/tests/engine_v2_tests -run 'TestShouldVerifyPureV2EpochSwitchHeadersEvenIfParentNotYetWrittenIntoDB' -count=1
- go test ./eth/downloader -run 'TestSyncBatchAncestorErrDropPeer|TestSyncBatchNoAncestorErrKeepPeer|TestBlockHeaderAttackerDropping64' -count=1
@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix unknown ancestor error and VerifyHeaders result race, XFN-12 fix(consensus/XDPoS): fix VerifyHeaders result race and false unknown ancestor error, XFN-12 Mar 23, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix VerifyHeaders result race and false unknown ancestor error, XFN-12 fix(consensus/XDPoS): fix result channel race and false unknown ancestor error in VerifyHeaders, XFN-12 Mar 23, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix result channel race and false unknown ancestor error in VerifyHeaders, XFN-12 fix(consensus/XDPoS): fix results ordering and false unknown ancestor error in VerifyHeaders, XFN-12 Mar 23, 2026
@AnilChinchawale AnilChinchawale merged commit 7aa089e into XinFinOrg:dev-upgrade Mar 23, 2026
13 checks passed
@gzliudan gzliudan deleted the fix-verify-order branch March 23, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants