Skip to content

feat: memory accounting for WindowAggExec#23207

Draft
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-agg-exec-memory-accounting
Draft

feat: memory accounting for WindowAggExec#23207
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-agg-exec-memory-accounting

Conversation

@Dandandan

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

WindowAggExec buffers the entire input into WindowAggStream.batches and then concat_batches it into one contiguous copy (a transient ~2x peak) before computing — with no consultation of the task memory pool (there is currently no MemoryReservation/MemoryConsumer anywhere in the window operators). On a large unbounded-window input this can be OOM-killed by the OS rather than failing gracefully, and the usage is invisible to pool accounting / EXPLAIN ANALYZE.

What changes are included in this PR?

Registers a MemoryConsumer for the stream (mirroring ExternalSorter) and reserves:

  • each buffered input batch as it arrives, via a RecordBatchMemoryCounter so buffers shared across batches (e.g. zero-copy slices from an upstream Sort/Repartition) are counted once, not once per batch; and
  • the freshly concatenated batch in compute_aggregates, reflecting the transient doubling during the final compute.

When the pool limit is exceeded, the operator returns ResourcesExhausted attributed to its consumer, matching ExternalSorter/AggregateExec.

Scope note: this is accounting + backpressure only. It does not by itself reduce peak memory or add spilling (that would be follow-up work); it makes the existing usage visible and fail gracefully.

Are these changes tested?

Yes:

  • A new unit test asserts the operator errors with ResourcesExhausted (naming its WindowAggStream consumer) under a tiny (100-byte) memory pool.
  • The full window sqllogictest suite passes under the default pool.

Are there any user-facing changes?

The default memory pool is unbounded, so default behavior is unchanged. Queries running against a bounded pool that previously over-committed may now fail deterministically with ResourcesExhausted instead of being OOM-killed — the intended behavior.

`WindowAggExec` buffers the entire input into `WindowAggStream.batches`
and then `concat_batches` it into one contiguous copy (a transient ~2x
peak) before computing, with no consultation of the task memory pool.
On a large unbounded-window input this can OOM-kill the process instead
of failing gracefully, and its usage is invisible to pool accounting.

This registers a `MemoryConsumer` for the stream and reserves:
- each buffered input batch as it arrives, using a
  `RecordBatchMemoryCounter` so buffers shared across batches (e.g.
  zero-copy slices from an upstream Sort/Repartition) are counted once
  rather than once per batch; and
- the freshly concatenated batch in `compute_aggregates`, reflecting the
  transient doubling during the final compute.

When the pool limit is exceeded the operator now returns
`ResourcesExhausted` attributed to its consumer, matching
`ExternalSorter`/`AggregateExec`. The default pool is unbounded, so
default behavior is unchanged. This is accounting/backpressure only — it
does not by itself reduce peak memory or add spilling.

A unit test asserts the operator errors with `ResourcesExhausted` (naming
its consumer) under a tiny memory pool; the `window` sqllogictest suite
passes under the default pool.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant