Skip to content

fix(consensus/XDPoS): stabilize batch header ancestor resolution#2187

Closed
gzliudan wants to merge 2 commits intoXinFinOrg:dev-upgradefrom
gzliudan:fix-unknown-ancestor
Closed

fix(consensus/XDPoS): stabilize batch header ancestor resolution#2187
gzliudan wants to merge 2 commits intoXinFinOrg:dev-upgradefrom
gzliudan:fix-unknown-ancestor

Conversation

@gzliudan
Copy link
Collaborator

@gzliudan gzliudan commented Mar 16, 2026

Proposed changes

fix Error: unknown ancestor during sync.

Add a batch-aware verifyChainReader for XDPoS VerifyHeaders so deep consensus lookups can read in-flight headers before they are persisted to disk. This prevents transient unknown-ancestor failures during sync, including v2 paths that query headers/blocks by hash and number (for example HookPenalty and QC processing).

Also add unit tests that verify batch shadowing for GetHeader/GetHeaderByHash/GetHeaderByNumber/GetBlock and fallback-to-underlying-reader behavior.

error message:

WARN [03-14|23:15:18.025] [VerifyHeaders] Fail to verify header    fullVerify=true blockNum=57,372,740 blockHash=ed239f..aaa7a7 error="parentHeader is nil"
INFO [03-14|23:15:19.117] Imported new chain segment               blocks=25  txs=158   mgas=180.259  elapsed=9.049s      mgasps=19.919   number=57,372,608 hash=51904b..8f9795 age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:27.469] Imported new chain segment               blocks=38  txs=164   mgas=201.322  elapsed=8.352s      mgasps=24.104   number=57,372,646 hash=bbae68..589809 age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:35.680] Imported new chain segment               blocks=32  txs=132   mgas=189.217  elapsed=8.210s      mgasps=23.046   number=57,372,678 hash=3e890a..adb56b age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:43.849] Imported new chain segment               blocks=37  txs=154   mgas=202.454  elapsed=8.169s      mgasps=24.782   number=57,372,715 hash=2a0849..ece177 age=2y3mo4w   dirty=0.00B
ERROR[03-14|23:15:47.688] [FindParentBlockToAssign] Can not find parent block from highestQC proposedBlockInfo x.highestQuorumCert.ProposedBlockInfo.Hash=7c02dc..d6327c x.highestQuorumCert.ProposedBlockInfo.Number=57,372,749
ERROR[03-14|23:15:47.688] [processQC] Block not found using the QC quorumCert.ProposedBlockInfo.Hash=7c02dc..d6327c incomingQuorumCert.ProposedBlockInfo.Number=57,372,749
ERROR[03-14|23:15:47.688] [ProposedBlockHandler] Fail to processQC "QC proposed blockInfo round number"=545409 "QC proposed blockInfo hash"=7c02dc..d6327c
INFO [03-14|23:15:47.688] [downloader] handle proposed block has error err="block not found, number: 57372749, hash: 0x7c02dca86a22ec4ec035181e48f6caa3d2c19b427f2944b378d3bea1a8d6327c" "block hash"=04dc92..b743c0 number=57,372,750
WARN [03-14|23:15:47.695] [VerifyHeaders] Fail to verify header    fullVerify=true blockNum=57,372,751 blockHash=abf618..2e866d error="unknown ancestor"
ERROR[03-14|23:15:47.706] 
########## BAD BLOCK #########
Number: 57372751
Hash: 0xabf6185b4fa86289dfa942beda325103f4d61cd1853b7bc561c93486e22e866d
Round: 545411
Error: unknown ancestor
Chain configuration:
  - ChainID:                     51      
  - Homestead:                   1       
  - DAO Fork:                    <nil>
  - DAO Support:                 false   
  - Tangerine Whistle (EIP 150): 2       
  - Spurious Dragon (EIP 155):   3       
  - Byzantium:                   4       
  - Constantinople:              <nil>
  - Petersburg:                  <nil>
  - Istanbul:                    <nil>
  - TIP2019Block:                1       
  - TIPSigning:                  3000000 
  - TIPRandomize:                3464000 
  - TIPIncreaseMasternodes:      5000000 
  - DenylistHFNumber:            23779191
  - TIPNoHalvingMNReward:        23779191
  - TIPXDCX:                     23779191
  - TIPXDCXLending:              23779191
  - TIPXDCXCancellationFee:      23779191
  - TIPTRC21Fee:                 23779191
  - Berlin:                      61290000
  - London:                      61290000
  - Merge:                       61290000
  - Shanghai:                    61290000
  - BlockNumberGas50x:           56828700
  - TIPXDCXMinerDisable:         61290000
  - TIPXDCXReceiverDisable:      66825000
  - Eip1559:                     71550000
  - Cancun:                      71551800
  - Prague:                      9223372036854775807
  - Osaka:                       9223372036854775807
  - DynamicGasLimitBlock:        9223372036854775807
  - TIPUpgradeReward:            9223372036854775807
  - TipUpgradePenalty:           9223372036854775807
  - TIPEpochHalving:             9223372036854775807
  - Engine:                      XDPoS
    - Period: 2
    - Epoch: 900
    - Reward: 5000
    - RewardCheckpoint: 900
    - Gap: 450
    - FoundationWalletAddr: xdc746249C61f5832C5eEd53172776b460491bDcd5C
    - SkipV1Validation: false
    - V2:
      - SwitchEpoch: 63143
      - SwitchBlock: 56828700
      - CurrentConfig:
        - MaxMasternodes: 15
        - SwitchRound: 0
        - MinePeriod: 2
        - TimeoutSyncThreshold: 3
        - TimeoutPeriod: 60
        - CertThreshold: 0.45
        - MasternodeReward: 0
        - ProtectorReward: 0
        - ObserverReward: 0
        - MinimumMinerBlockPerEpoch: 0
        - LimitPenaltyEpoch: 0
        - MinimumSigningTx: 0
        - ExpTimeoutBase: 1
        - ExpTimeoutMaxExponent: 0
Receipts: 
##############################

WARN [03-14|23:15:47.712] Synchronisation failed, dropping peer    peer=4464397694e0c8b1 err="retrieved hash chain is invalid: unknown ancestor"

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

…sion, close XFN-12

Problem

- Mixed v1/v2 header batches around the switch boundary caused ambiguous verification flow and error attribution.
- Early failures could leave unnecessary verification work running without explicit batch-level stop coverage.

Fix

- Split header verification into contiguous consensus-version batches in core import paths.
- Keep strict mixed-batch rejection in XDPoS adaptor as a defensive guard (ErrMixedConsensusBatch).
- Restore explicit abort propagation in blockchain batch verification loop.
- Add empty-chain guard in HeaderChain.ValidateHeaderChain.

Tests

- Add consensus batch split unit tests for zero/single/multi/mixed v1-v2 scenarios.
- Add adaptor test for mixed-batch rejection.
- Add abort regression tests to ensure second batch does not start and first-batch trailing results are not emitted after abort.
Add a batch-aware verifyChainReader for XDPoS VerifyHeaders so deep consensus lookups can read in-flight headers before they are persisted to disk. This prevents transient unknown-ancestor failures during sync, including v2 paths that query headers/blocks by hash and number (for example HookPenalty and QC processing).

Also add unit tests that verify batch shadowing for GetHeader/GetHeaderByHash/GetHeaderByNumber/GetBlock and fallback-to-underlying-reader behavior.
Copilot AI review requested due to automatic review settings March 16, 2026 04:24
@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 829e3558-5058-4bdf-8b9b-a3975fbb9ddf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate transient unknown ancestor / missing-parent failures during XDPoS batch header verification by making consensus lookups batch-aware (so in-flight headers can be resolved before being persisted), and by adjusting verification to operate on consensus-version-consistent batches.

Changes:

  • Add an XDPoS verifyChainReader that shadows ChainReader lookups with headers/blocks from the current verify batch.
  • Split header verification into contiguous consensus-version batches in HeaderChain.ValidateHeaderChain and BlockChain.insertChain.
  • Add unit tests for batch splitting and for the new chain-reader shadowing + abort behavior across batches.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/headerchain.go Validate headers in per-consensus batches and handle empty input safely.
core/blockchain.go Wrap header verification to run per-consensus batches during block import.
core/consensus_batch.go Add helper to split contiguous headers by consensus version.
core/consensus_batch_test.go Unit tests for consensus batch splitting.
core/headerchain_validation_test.go Regression test: empty header chain validation should succeed.
core/blockchain_abort_test.go Tests for abort propagation and ensuring later batches don’t start after early failure.
consensus/XDPoS/verify_chain_reader.go New batch-shadowing ChainReader for deterministic in-flight ancestor lookups.
consensus/XDPoS/XDPoS.go Use batch-shadow reader and introduce mixed-batch handling behavior.
consensus/XDPoS/XDPoS_test.go Tests for mixed-batch behavior and chain-reader shadow/fallback behavior.
consensus/tests/engine_v2_tests/verify_header_test.go Update tests to respect per-consensus batching expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +253 to 275
for _, batch := range splitHeadersByConsensusVersion(hc.config, chain) {
abort, results := hc.engine.VerifyHeaders(hc, chain[batch.start:batch.end], seals[batch.start:batch.end])
for i := batch.start; i < batch.end; i++ {
header := chain[i]
// If the chain is terminating, stop processing blocks
if hc.procInterrupt() {
close(abort)
log.Debug("Premature abort during headers verification")
return 0, errors.New("aborted")
}
// If the header is a banned one, straight out abort
if BadHashes[header.Hash()] {
close(abort)
return i, ErrDenylistedHash
}
// Otherwise wait for headers checks and ensure they pass
if err := <-results; err != nil {
close(abort)
return i, err
}
}
close(abort)
}
Comment on lines +1530 to +1534
results := make(chan error, len(headers))
go func() {
for _, batch := range splitHeadersByConsensusVersion(bc.chainConfig, headers) {
batchAbort, batchResults := bc.engine.VerifyHeaders(bc, headers[batch.start:batch.end], seals[batch.start:batch.end])
stopped := false
Comment on lines +241 to +254
x.EngineV1.VerifyHeaders(verifyReader, v1headers, v1fullVerifies, abort, results)
case v1Count == 0 && v2Count != 0:
x.EngineV2.VerifyHeaders(verifyReader, v2headers, v2fullVerifies, abort, results)
case v1Count != 0 && v2Count != 0:
go func() {
for range headers {
select {
case <-abort:
return
case results <- ErrMixedConsensusBatch:
}
}
}()
return abort, results
@gzliudan gzliudan added the WIP work in process label Mar 16, 2026
@gzliudan
Copy link
Collaborator Author

fixed by #2139

@gzliudan gzliudan closed this Mar 18, 2026
@gzliudan gzliudan deleted the fix-unknown-ancestor branch March 18, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP work in process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants