Skip to content

Fix/issue 20779 subtract overflow#20799

Open
KARTIK64-rgb wants to merge 6 commits intoapache:mainfrom
KARTIK64-rgb:fix/issue-20779-subtract-overflow
Open

Fix/issue 20779 subtract overflow#20799
KARTIK64-rgb wants to merge 6 commits intoapache:mainfrom
KARTIK64-rgb:fix/issue-20779-subtract-overflow

Conversation

@KARTIK64-rgb
Copy link

Which issue does this PR close?

Closes #20779.

Rationale for this change

In max_distinct_count (inside datafusion/physical-plan/src/joins/utils.rs), the
Precision::Exact branch computes the number of non-null rows by doing:

let count = count - stats.null_count.get_value().unwrap_or(&0);

Before #20228 this subtraction was always safe because num_rows was never smaller
than null_count. But #20228 added fetch (limit push-down) support to
HashJoinExec, and when a limit is applied, partition_statistics() caps
num_rows to Exact(fetch_value) without also capping the per-column
null_count. This means null_count can legally exceed num_rows, causing a
panic with "attempt to subtract with overflow".

What changes are included in this PR?

  • Bug fix in max_distinct_count (utils.rs ~line 725): replaced the bare
    subtraction with a saturating subtraction so that when null_count exceeds
    num_rows the result is clamped to 0 instead of panicking.

    // Before
    let count = count - stats.null_count.get_value().unwrap_or(&0);
    
    // After
    let count = count.saturating_sub(*stats.null_count.get_value().unwrap_or(&0));
  • Regression test added at the bottom of the mod tests block in the same
    file. The test deliberately constructs a scenario where null_count (5) > num_rows (2) and asserts that max_distinct_count returns Exact(0) without
    panicking.

Are these changes tested?

Yes. A new unit test
test_max_distinct_count_no_overflow_when_null_count_exceeds_num_rows is added
directly in datafusion/physical-plan/src/joins/utils.rs. It covers the exact
edge-case from the bug report (null_count > num_rows after a fetch/limit
push-down) and would have caught the panic before the fix.

Are there any user-facing changes?

No user-facing or API changes. This is a purely internal arithmetic fix in the
statistics estimation logic. Queries that previously panicked when a limit was
pushed down into a HashJoinExec will now complete successfully.

@github-actions github-actions bot added substrait Changes to the substrait crate physical-plan Changes to the physical-plan crate labels Mar 8, 2026
@github-actions github-actions bot removed the substrait Changes to the substrait crate label Mar 8, 2026
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't catch this would happen in my PR. Thanks for the fix!

@jonathanc-n
Copy link
Contributor

cc @gabotechs

Ok(())
}
}
#[test]
Copy link
Contributor

@jonathanc-n jonathanc-n Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a test for this for sqllogictests @KARTIK64-rgb, can add:

statement ok
CREATE TABLE t1(a INT, b INT) AS VALUES 
  (NULL, 1), (NULL, 2), (NULL, 3), (NULL, 4), (NULL, 5);

statement ok
CREATE TABLE t2(c INT) AS VALUES (1), (2);

# This query panicked before the fix: the ORDER BY forces a SortExec,
# the LIMIT gets pushed into SortExec.fetch, and the HashJoinExec
# calls partition_statistics() on the SortExec child during execution.
query II
SELECT sub.a, sub.b FROM (
  SELECT * FROM t1 ORDER BY b LIMIT 1
) sub 
JOIN t2 ON sub.a = t2.c;
----

statement ok
DROP TABLE t1;

statement ok
DROP TABLE t2;

i verified it reproduces the bug

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 9, 2026
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @KARTIK64-rgb for quick fix and @jonathanc-n for the review!

As soon as the CI issues are addressed, this is good to merge.

Comment on lines +725 to +726
let null_count = *stats.null_count.get_value().unwrap_or(&0);
let non_null_count = count.checked_sub(null_count).unwrap_or(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice, even if this is a good safeguard, the fact that this can even happen makes me think that there is some further work to be done in the stats propagation mechanism.

Ideally, this would not even be possible by construction, but that's a topic for another PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subtraction overflow in max_distinct_count when hash join has a pushed-down limit

3 participants