Add heapsort fallback in select_nth_unstable#106997
Merged
bors merged 1 commit intorust-lang:masterfrom Jan 18, 2023
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #102451 and #106933.
slice::select_nth_unstableuses a quick select implementation based on the same pattern defeating quicksort algorithm thatslice::sort_unstableuses.slice::sort_unstableuses a recursion limit and falls back to heapsort if there were too many bad pivot choices, to ensure O(n log n) worst case running time (known as introsort). However,slice::select_nth_unstabledoes not have such a fallback strategy, which leads to it having a worst case running time of O(n²) instead. #102451 links to a playground which generates pathological inputs that show this quadratic behavior. On my machine, a randomly generated slice of length1 << 19takes ~200µs to calculate its median, whereas a pathological input of the same length takes over 2.5s. This PR adds an iteration limit toselect_nth_unstable, falling back to heapsort, which ensures an O(n log n) worst case running time (introselect). With this change, there was no noticable slowdown for the random input, but the same pathological input now takes only ~1.2ms. In the future it might be worth implementing something like Median of Medians or Fast Deterministic Selection instead, which guarantee O(n) running time for all possible inputs. I've left this as aFIXMEfor now and only implemented the heapsort fallback to minimize the needed code changes.I still think we should clarify in the
select_nth_unstabledocs that the worst case running time isn't currently O(n) (the original reason that #102451 was opened), but I think it's a lot better to be able to guarantee O(n log n) instead of O(n²) for the worst case.