Skip to content

[ISSUE #10494] Fix flaky HATest.testSemiSyncReplica#10495

Open
RongtongJin wants to merge 1 commit into
developfrom
codex/fix-ha-test-semi-sync-flakiness
Open

[ISSUE #10494] Fix flaky HATest.testSemiSyncReplica#10495
RongtongJin wants to merge 1 commit into
developfrom
codex/fix-ha-test-semi-sync-flakiness

Conversation

@RongtongJin

@RongtongJin RongtongJin commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Wait for the master-side HA connection to observe the slave's initial ack offset before running HATest semi-sync message writes.
  • Add a condition-based readiness helper that checks the slave client is in TRANSFER and the master connection ack offset covers the slave max physical offset.

Root Cause

HATest previously waited only for the slave-side HA client to enter TRANSFER. That state can be reached before the master-side HAConnection receives the slave's initial offset report, leaving slaveAckOffset at -1. The first semi-sync write can race that initial report and return FLUSH_SLAVE_TIMEOUT instead of PUT_OK on slower CI machines.

Impact

This stabilizes HATest.testSemiSyncReplica without changing production HA behavior.

Fixes #10494

Validation

/tmp/codex-maven/apache-maven-3.9.9/bin/mvn -pl store -am -Dtest=HATest#testSemiSyncReplica -DskipITs -DfailIfNoTests=false test
/tmp/codex-maven/apache-maven-3.9.9/bin/mvn -pl store -am -Dtest=HATest -DskipITs -DfailIfNoTests=false test

Full HATest result: Tests run: 4, Failures: 0, Errors: 0, Skipped: 1.

Stress check:

for i in $(seq 1 100); do
  /tmp/codex-maven/apache-maven-3.9.9/bin/mvn -q -pl store -am \
    -Dtest=HATest#testSemiSyncReplica \
    -DskipITs -DfailIfNoTests=false \
    -Dcheckstyle.skip=true -Dspotbugs.skip=true -Djacoco.skip=true test
done

Result: 100 consecutive HATest#testSemiSyncReplica runs passed.

@RongtongJin RongtongJin changed the title [codex] Fix flaky HATest semi-sync replication Fix flaky HATest.testSemiSyncReplica Jun 13, 2026
@RongtongJin RongtongJin changed the title Fix flaky HATest.testSemiSyncReplica [ISSUE #10494] Fix flaky HATest.testSemiSyncReplica Jun 13, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.05%. Comparing base (86d1df4) to head (3b41597).
⚠️ Report is 11 commits behind head on develop.

Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10495      +/-   ##
=============================================
- Coverage      48.08%   48.05%   -0.03%     
- Complexity     13326    13332       +6     
=============================================
  Files           1377     1377              
  Lines         100644   100707      +63     
  Branches       12995    13010      +15     
=============================================
+ Hits           48393    48394       +1     
- Misses         46329    46368      +39     
- Partials        5922     5945      +23     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RongtongJin RongtongJin marked this pull request as ready for review June 13, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HATest.testSemiSyncReplica can fail before HA slave ack is visible on master

2 participants