-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Do not require mut in memory reservation methods #19759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not require mut in memory reservation methods #19759
Conversation
2a538b1 to
a578d54
Compare
a578d54 to
4f560aa
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @gabotechs
| /// pool, returning the number of bytes freed. | ||
| pub fn free(&mut self) -> usize { | ||
| let size = self.size; | ||
| pub fn free(&self) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this technically this is a breaking API change? I thought about it and from what I can tell the answer is no as to all this API in DataFusion 52 the caller needs a mut and in 53 would not (but could still call it with mut even though that is not needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, this is not a breaking API change.
One thing that could happen is that people have some clippy lint that goes off in case of "unused muts". In that case people will start seeing new clippy warnings with DataFusion 53 in their own code.
| pub fn free(&mut self) -> usize { | ||
| let size = self.size; | ||
| pub fn free(&self) -> usize { | ||
| let size = self.size.load(atomic::Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that the reservations should be consistent (Ordering::seqcst), otherwise I would worry that we run the risk of not seeing other changes.
However, Relaxed seems to be used in the MemoryPools themselves, so this is consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the actual accounting that matters is done in the impl MemoryPools, so using something more consistent here is not really going to yield any improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running some benchmarks just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be good
|
run benchmark tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thanks @gabotechs |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change This is a PR from a batch of PRs that attempt to improve performance in hash joins: - #19759 - This PR - #19761 It adds a building block that allows eagerly collecting data on the probe side of a hash join before the build side is finished. Even if the intended use case is for hash joins, the new execution node is generic and is designed to work anywhere in the plan. ## What changes are included in this PR? > [!NOTE] > The new BufferExec node introduced in this PR is still not wired up automatically <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Adds a new `BufferExec` node that can buffer up to a certain size in bytes for each partition eagerly performing work that otherwise would be delayed. Schematically, it looks like this: ``` ┌───────────────────────────┐ │ BufferExec │ │ │ │┌────── Partition 0 ──────┐│ ││ ┌────┐ ┌────┐││ ┌────┐ ──background poll────────▶│ │ │ ├┼┼───────▶ │ ││ └────┘ └────┘││ └────┘ │└─────────────────────────┘│ │┌────── Partition 1 ──────┐│ ││ ┌────┐ ┌────┐ ┌────┐││ ┌────┐ ──background poll─▶│ │ │ │ │ ├┼┼───────▶ │ ││ └────┘ └────┘ └────┘││ └────┘ │└─────────────────────────┘│ │ │ │ ... │ │ │ │┌────── Partition N ──────┐│ ││ ┌────┐││ ┌────┐ ──background poll───────────────▶│ ├┼┼───────▶ │ ││ └────┘││ └────┘ │└─────────────────────────┘│ └───────────────────────────┘ ``` ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> yes, by new unit tests ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> users can import a new `BufferExec` execution plan in their codebase, but no internal usage is shipped yet in this PR. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
Prerequisite for the following PRs:
Even if the api on the
MemoryPooldoes not require&mut selffor growing/shrinking the reserved size, the api inMemoryReservationdoes, making simple implementations irrepresentable without synchronization primitives. For example, the following would require aMutexfor concurrent access to theMemoryReservationin different threads, even though theMemoryPooldoesn't:What changes are included in this PR?
Make the methods in
MemoryReservationrequire&selfinstead of&mut selffor allowing concurrent shrink/grows from different tasks for the same reservation.Are these changes tested?
yes, by current tests
Are there any user-facing changes?
Users can now safely call methods of
MemoryReservationfrom different tasks without synchronization primitives.This is a backwards compatible API change, as it will work out of the box for current users, however, depending on their clippy configuration, they might see some new warnings about "unused muts" in their codebase.