Skip to content

Draining read, take 2#6265

Open
jasnell wants to merge 10 commits intomainfrom
jasnell/fixup-drainingread
Open

Draining read, take 2#6265
jasnell wants to merge 10 commits intomainfrom
jasnell/fixup-drainingread

Conversation

@jasnell
Copy link
Collaborator

@jasnell jasnell commented Mar 5, 2026

Fixes up multiple issues with the draining read implementation, and more importantly, places the change behind an autogate.

  1. Fixes UAF caused by an errant state change
  2. Fixes an unbounded read in certain cases
  3. Fixes handling of closed and canceled states

Along the way,

  1. Fixes a bug in the @all-autogates variant that caused it to be ineffective for any test using TestFixture
  2. Add an opencode skill for more effectively running bazel tests (was super necessary in this case)
  3. Refined the investigate command/skill a bit to help keep it focused.

@jasnell jasnell requested review from a team as code owners March 5, 2026 23:06
Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR fixes multiple safety issues (UAF, unbounded reads, closed/canceled state handling) in the draining read path for standard streams, places the new pumpTo implementation behind an autogate (ENABLE_DRAINING_READ_ON_STANDARD_STREAMS), and fixes a real bug where @all-autogates test variants were ineffective when TestFixture called initAutogate({}) early. Excellent test coverage with regression tests for each bug class.

Issues

  1. (Medium) Duplicated code between ValueQueue and ByteQueue drainingRead()

    The post-pump-loop logic (lines 253-303 for ValueQueue, lines 723-770 for ByteQueue) is nearly identical across the two implementations: !impl.state.isActive() check, impl.queue == kj::none check, isClosing finalization, and the normal data-return path. This is a maintenance hazard — if a bug is found in one path, it must be fixed in both. Not a blocker but worth noting as a follow-up to extract the shared post-pump logic into a common helper.

  2. (Low) benchChunkCounterStatic is a file-scope mutable global

    In bench-pumpto.c++, benchChunkCounterStatic (line ~133 of the new file) is a file-scoped static size_t that gets reset in createConfiguredStream() and mutated via raw pointer in pull callbacks. Google Benchmark can run iterations in parallel in some configurations. This is unlikely to cause issues with the current --benchmark_min_time defaults, but a per-invocation kj::heap<size_t>() or a member of a benchmark fixture would be more robust. Not blocking since the benchmark is tags = ["manual"].

@ask-bonk

This comment was marked as duplicate.

@ask-bonk ask-bonk bot requested a review from a team as a code owner March 5, 2026 23:22
@github-actions

This comment was marked as outdated.

@jasnell

This comment was marked as outdated.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 5, 2026

I have read the CLA Document and I hereby sign the CLA

@ask-bonk

This comment was marked as outdated.

ask-bonk bot added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: jasnell <jasnell@users.noreply.github.com>
github-actions bot added a commit that referenced this pull request Mar 5, 2026
@jasnell jasnell force-pushed the jasnell/fixup-drainingread branch from fa9c248 to e41bcda Compare March 5, 2026 23:32
jasnell added 10 commits March 6, 2026 08:06
…uction

The pumpToImpl coroutine uses DrainingReader which calls
`consumer->drainingRead()``. That call may trigger
`onConsumerWantsData` -> `forcePull` -> `pull` callback -> synchronous
close/error -> `deferTransitionTo<Closed>`. Previously, both
`ValueReadable::drainingRead()` and the caller
(`ReadableStreamJsController::drainingRead`) each called
`beginOperation()`/`endOperation()` independently. The inner
`endOperation()` in ValueReadable/ByteReadable fired the deferred state
transition before the caller's `wrapDrainingRead` could set up `.then()`
callbacks on the returned promise. Since those callbacks capture `this`
(the Consumer), the transition destroyed the Consumer out from under
them triggering the UAF.

The fix removes `beginOperation()`/`endOperation()` from
`ValueReadable::drainingRead()` and `ByteReadable::drainingRead()`, and
moves the single `beginOperation()` call to before
`consumer->drainingRead()` at each call site in
`ReadableStreamJsController::drainingRead`. The matching `endOperation()`
remains in the `.then()`/`.catch()` callbacks of `wrapDrainingRead`, ensuring
the deferred state change only fires after the Consumer's
this-capturing callbacks have already run. `js.tryCatch` wraps each call
site for exception safety.

The tests do not perfectly catch the UAF even with ASAN because the
conditions are extremely timing-sensitive and I haven't yet found a
way to reproduce the exact timing reliably in workerd.
@jasnell jasnell force-pushed the jasnell/fixup-drainingread branch from e41bcda to cc5c650 Compare March 6, 2026 16:06

# Benchmark for PumpToReader (ReadableStream::pumpTo path in standard.c++).
# Run before and after DrainingReader adoption to measure improvement.
# bazel run --config=opt //src/workerd/tests:bench-pumpto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just bench pumpto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants