Core: Fix background thread leak in ScanTaskIterable#16768
Open
sejal-gupta-ksolves wants to merge 1 commit into
Open
Core: Fix background thread leak in ScanTaskIterable#16768sejal-gupta-ksolves wants to merge 1 commit into
sejal-gupta-ksolves wants to merge 1 commit into
Conversation
Fixes an issue where background PlanTaskWorker threads remain indefinitely blocked in offerWithTimeout() when a query is cancelled or abandoned early because the outer ScanTaskIterable.close() method was a no-op.
singhpk234
reviewed
Jun 11, 2026
Comment on lines
+110
to
+121
| public void close() throws IOException { | ||
| if (shutdown.compareAndSet(false, true)) { | ||
| LOG.info( | ||
| "ScanTaskIterable is closing. Clearing {} queued tasks, {} plan tasks, and {} initial file scan tasks.", | ||
| taskQueue.size(), | ||
| planTasks.size(), | ||
| initialFileScanTasks.size()); | ||
| taskQueue.clear(); | ||
| planTasks.clear(); | ||
| initialFileScanTasks.clear(); | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
why not call ScanTasksIterator.close() instead ?
Comment on lines
+111
to
+120
| if (shutdown.compareAndSet(false, true)) { | ||
| LOG.info( | ||
| "ScanTaskIterable is closing. Clearing {} queued tasks, {} plan tasks, and {} initial file scan tasks.", | ||
| taskQueue.size(), | ||
| planTasks.size(), | ||
| initialFileScanTasks.size()); | ||
| taskQueue.clear(); | ||
| planTasks.clear(); | ||
| initialFileScanTasks.clear(); | ||
| } |
Contributor
There was a problem hiding this comment.
we should make a helper function and reuse it in both the places where close is called ?
singhpk234
reviewed
Jun 11, 2026
| } | ||
|
|
||
| @Override | ||
| public void close() throws IOException {} |
Contributor
There was a problem hiding this comment.
we can register the iterable in the closeable group and then close it as well
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #16758
Problem
When downstream query engines (such as StarRocks, Trino, or Spark) cancel or abort a REST table scan early due to client disconnects, timeouts, or query limits, they trigger the cleanup sequence on the outer execution container.
In Apache Iceberg,
ScanTaskIterable.close()was implemented as an empty no-op method. Because this outerclose()call failed to cascade the shutdown signal to the underlying data structures:shutdownstate atomic flag remainedfalse.PlanTaskWorkerthreads continued running indefinitely.taskQueuereached its1000item capacity limit, all active worker threads became permanently deadlocked insideofferWithTimeout(), leading to thread pool exhaustion on the engine coordinator side.Solution
ScanTaskIterable.close()utilizingshutdown.compareAndSet(false, true).taskQueue,planTasks, andinitialFileScanTaskslists upon termination. This allows background threads stuck in anofferwait cycle to instantly unblock, evaluate the flipped shutdown state, and exit gracefully.ScanTasksIterator.close()block to eliminate redundant code duplication, rewriting it to delegate its cleanup tasks straight up toScanTaskIterable.this.close(). This ensures unified thread termination safety across all potential entry points.TestScanTaskIterableLeakunder theorg.apache.iceberg.resttest package, proving that active planning thread allocations successfully scale back down to0upon premature termination.Verification Testing