Skip to content

fix: Track metrics in hash joins with empty build sides#20810

Merged
2010YOUY01 merged 2 commits intoapache:mainfrom
nuno-faria:empty_build_side_output_rows
Mar 13, 2026
Merged

fix: Track metrics in hash joins with empty build sides#20810
2010YOUY01 merged 2 commits intoapache:mainfrom
nuno-faria:empty_build_side_output_rows

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Show the correct metrics when executing hash joins.

What changes are included in this PR?

  • The fast path in process_probe_batch now stores the output batch in output_buffer instead of returning it, since the metrics are only updated with batches stored there.
  • Added unit test.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Mar 8, 2026
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nuno-faria!

self.state = HashJoinStreamState::FetchProbeBatch;

return Ok(StatefulStreamResult::Ready(Some(result)));
return Ok(StatefulStreamResult::Continue);
Copy link
Contributor

@jonathanc-n jonathanc-n Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can slowly transition everything to slowly not need to variants of StatefulStreamResult, maybe an issue can be raised for this. It goes hand in hand with using the limited batch coalescer

Copy link
Contributor

@jonathanc-n jonathanc-n Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dandandan We can replicate the pattern you added with LimitedBatchCoalescer and updating metrics in poll_next_impl throughout the codebase

@jonathanc-n
Copy link
Contributor

cc @Dandandan for review

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@@ -655,10 +655,10 @@ impl HashJoinStream {
self.join_type,
)?;
timer.done();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but I noticed we don’t need to explicitly end the timer here since it’s an RAII variable. I’ll open a separate PR after this is merged.

Co-authored-by: Yongting You <2010youy01@gmail.com>
@2010YOUY01 2010YOUY01 added this pull request to the merge queue Mar 13, 2026
Merged via the queue into apache:main with commit 422b545 Mar 13, 2026
34 checks passed
@nuno-faria nuno-faria deleted the empty_build_side_output_rows branch March 13, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hash join with empty build side displays wrong number of output rows

3 participants