Fix/issue 20779 subtract overflow#20799
Open
KARTIK64-rgb wants to merge 6 commits intoapache:mainfrom
Open
Conversation
jonathanc-n
approved these changes
Mar 8, 2026
Contributor
jonathanc-n
left a comment
There was a problem hiding this comment.
Sorry didn't catch this would happen in my PR. Thanks for the fix!
Contributor
|
cc @gabotechs |
jonathanc-n
reviewed
Mar 8, 2026
| Ok(()) | ||
| } | ||
| } | ||
| #[test] |
Contributor
There was a problem hiding this comment.
I created a test for this for sqllogictests @KARTIK64-rgb, can add:
statement ok
CREATE TABLE t1(a INT, b INT) AS VALUES
(NULL, 1), (NULL, 2), (NULL, 3), (NULL, 4), (NULL, 5);
statement ok
CREATE TABLE t2(c INT) AS VALUES (1), (2);
# This query panicked before the fix: the ORDER BY forces a SortExec,
# the LIMIT gets pushed into SortExec.fetch, and the HashJoinExec
# calls partition_statistics() on the SortExec child during execution.
query II
SELECT sub.a, sub.b FROM (
SELECT * FROM t1 ORDER BY b LIMIT 1
) sub
JOIN t2 ON sub.a = t2.c;
----
statement ok
DROP TABLE t1;
statement ok
DROP TABLE t2;
i verified it reproduces the bug
gabotechs
approved these changes
Mar 9, 2026
Contributor
There was a problem hiding this comment.
Looks good, thanks @KARTIK64-rgb for quick fix and @jonathanc-n for the review!
As soon as the CI issues are addressed, this is good to merge.
Comment on lines
+725
to
+726
| let null_count = *stats.null_count.get_value().unwrap_or(&0); | ||
| let non_null_count = count.checked_sub(null_count).unwrap_or(0); |
Contributor
There was a problem hiding this comment.
👍 nice, even if this is a good safeguard, the fact that this can even happen makes me think that there is some further work to be done in the stats propagation mechanism.
Ideally, this would not even be possible by construction, but that's a topic for another PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #20779.
Rationale for this change
In
max_distinct_count(insidedatafusion/physical-plan/src/joins/utils.rs), thePrecision::Exactbranch computes the number of non-null rows by doing:Before #20228 this subtraction was always safe because
num_rowswas never smallerthan
null_count. But #20228 addedfetch(limit push-down) support toHashJoinExec, and when a limit is applied,partition_statistics()capsnum_rowstoExact(fetch_value)without also capping the per-columnnull_count. This meansnull_countcan legally exceednum_rows, causing apanic with "attempt to subtract with overflow".
What changes are included in this PR?
Bug fix in
max_distinct_count(utils.rs~line 725): replaced the baresubtraction with a saturating subtraction so that when
null_countexceedsnum_rowsthe result is clamped to0instead of panicking.Regression test added at the bottom of the
mod testsblock in the samefile. The test deliberately constructs a scenario where
null_count (5) > num_rows (2)and asserts thatmax_distinct_countreturnsExact(0)withoutpanicking.
Are these changes tested?
Yes. A new unit test
test_max_distinct_count_no_overflow_when_null_count_exceeds_num_rowsis addeddirectly in
datafusion/physical-plan/src/joins/utils.rs. It covers the exactedge-case from the bug report (null_count > num_rows after a fetch/limit
push-down) and would have caught the panic before the fix.
Are there any user-facing changes?
No user-facing or API changes. This is a purely internal arithmetic fix in the
statistics estimation logic. Queries that previously panicked when a limit was
pushed down into a
HashJoinExecwill now complete successfully.