[BP-2.3][FLINK-37663][connector-base] Fix lost wakeup in FutureCompletingBlockingQueue#28467
Open
MartijnVisser wants to merge 2 commits into
Open
[BP-2.3][FLINK-37663][connector-base] Fix lost wakeup in FutureCompletingBlockingQueue#28467MartijnVisser wants to merge 2 commits into
MartijnVisser wants to merge 2 commits into
Conversation
…ockingQueue lost wakeup A freed slot signals a putter that wakeUpPuttingThread() already released instead of the genuinely waiting putter; the test reproduces the stall and fails without the fix. Its @VisibleForTesting accessor is recorded in the architecture violation store, like the existing take(). Generated-by: Claude Code (Claude Opus 4.8)
… wakeup Remove a blocked putter's condition from the notFull queue when its await() returns, so signalNextPutter() no longer signals a putter that has stopped waiting after a graceful wakeUpPuttingThread() or a spurious wakeup. Generated-by: Claude Code (Claude Opus 4.8)
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Backport of #28463 to
release-2.3.FutureCompletingBlockingQueue(the split-fetcher to source-reader handover) has a lost-wakeup bug. A putter that blocks input()registers itsConditionin the internalnotFullqueue, but that condition is not always removed when the putter stops waiting:wakeUpPuttingThread(int)(and returnsfalsewithout enqueuing), andawait()returns spuriously (permitted by theConditioncontract) and it re-blocks, re-adding its condition.The leftover (stale or duplicated) entry makes a later
signalNextPutter(), fired when a consumer frees a slot, signal a thread that is no longer waiting. A genuinely waiting putter is then never woken, so a split fetcher can stall indefinitely.Brief change log
FutureCompletingBlockingQueue#waitOnPut: remove the putter's condition fromnotFullonceawait()returns (in afinally), restoring the invariant thatnotFullonly holds conditions of currently waiting putters.@VisibleForTestingaccessor (getNumberOfQueuedPutters()) so the regression test can deterministically sequence the two contending putters. Like the existing@VisibleForTesting take()in the same class, this trips the blanket connector "depend only on public API" ArchUnit rule, so it is recorded in theflink-architecture-tests-productionviolation store alongside the existing entry.FutureCompletingBlockingQueueTest#testWakeUpDoesNotStrandAnotherPutter, a deterministic regression test that reproduces the stranded-putter stall and fails on the unfixed code.Verifying this change
This change added tests and can be verified as follows:
FutureCompletingBlockingQueueTest#testWakeUpDoesNotStrandAnotherPutterblocks two putters, wakes the first gracefully, then frees one slot and asserts the second (still-waiting) putter is signalled. It fails on the code before the fix (the still-waiting putter is stranded, the latch times out) and passes after the fix.Does this pull request potentially affect one of the following parts:
@Public(Evolving): no (FutureCompletingBlockingQueueis@Internal)Documentation
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Claude Opus 4.8)