perf: emit WindowAggExec output in batch_size chunks#23206
Draft
Dandandan wants to merge 1 commit into
Draft
Conversation
`WindowAggExec` (the non-streaming window operator used when a frame ends in UNBOUNDED FOLLOWING, etc.) buffers the entire input, computes all window columns, and then emits the result as a single `RecordBatch` sized to the whole input. This forces every downstream operator that doesn't internally coalesce (sort ingest, joins, the client, ...) to hold one batch covering all rows at once, unlike `AggregateExec` and `BoundedWindowAggExec`, which honor `batch_size`. This emits the computed result in `batch_size`-row slices across polls. Slicing is zero-copy (`RecordBatch::slice` adjusts offset/length over shared buffers), so it adds no per-row work and no extra copy; it only bounds the batch each downstream operator must hold. The window computation itself is unchanged, so this does not reduce WindowAggExec's own peak (it still buffers all input) — a separate, larger concern. `batch_size` is read from the session config in `execute` and clamped to at least 1. A unit test asserts a 10-row result is emitted as 4/4/2-row chunks with `batch_size = 4` and that the running-count column is unaffected; the `window` sqllogictest suite passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
WindowAggExec(the non-streaming window operator, used when a frame ends inUNBOUNDED FOLLOWING, a UDWF lacks bounded execution, etc.) buffers the entire input, computes all window columns, and then emits the result as oneRecordBatchsized to the whole input. That forces every downstream operator which doesn't internally coalesce (sort ingest, joins, the client, …) to hold a single batch covering all rows at once — unlikeAggregateExec(row_hash.rs) andBoundedWindowAggExec, which both honorbatch_size.What changes are included in this PR?
WindowAggStreamnow stores the fully-computed result and emits it inbatch_size-row slices across polls. Slicing is zero-copy (RecordBatch::sliceadjusts offset/length over shared buffers), so this adds no per-row work and no extra copy — it only bounds the batch each downstream operator must hold.batch_sizeis read from the session config inexecute(beforecontextis moved into the child) and clamped to at least 1.Scope note: the window computation itself is unchanged, so this does not reduce
WindowAggExec's own peak memory (it still buffers all input + the concatenated copy). That is a separate, larger concern; this PR only stops forcing a mega-batch onto downstream consumers.Are these changes tested?
Yes:
batch_size = 4, and that the running-count window column is unaffected by chunking.windowsqllogictest suite passes (all 6 files). Plan-shape tests are unaffected (only output batching changes, not the plan).Are there any user-facing changes?
No behavior change beyond output batch sizing (results and ordering are identical);
WindowAggExecnow honors the configuredbatch_sizelike other operators.