Refactor Float.round/2 algorithm for better performance#15329
Refactor Float.round/2 algorithm for better performance#15329josevalim merged 7 commits intoelixir-lang:mainfrom
Float.round/2 algorithm for better performance#15329Conversation
|
In <<result::float>> =
<<sign::1, exp + 1023::11, mantissa - @power_of_2_to_52::52>>If mantissa - @power_of_2_to_52 == 2^52which does not fit in a 52-bit field. These tests should catch it: assert Float.round(:math.pow(2, 50), 1) === :math.pow(2, 50)
assert Float.ceil(:math.pow(2, 50), 1) === :math.pow(2, 50)
assert Float.floor(:math.pow(2, 50), 1) === :math.pow(2, 50)I believe you need to handle the carry/upper-bound normalization before packing: {mantissa, exp} =
if mantissa == @power_of_2_to_52 <<< 1 do
{@power_of_2_to_52, exp + 1}
else
{mantissa, exp}
endBut please double check with the Go implementation too. |
|
Or in general test these: Make sure to remove other assertions that already check powers of two, as this will effectively verify all of them. |
|
I have dropped some comments. If you have used AI for implementing it, please disclose so! And, in such cases, it may be useful to ask different models to review the code, both with and without the reference Go implementation. :) |
… of Cox, but only a refactor of the existing exact rational scaling approach. Fix bug when mantissa is exactly `2^53`. Fix Float.round/2 slow-path mantissa overflow when shift_adjust < 0 and the integer quotient lands in [2^53, 2^54)
|
@josevalim thank you for your comments and review ❤️ And my apologies: I read the relevant parts of the paper and worked with Opus to get a high-level understanding of how it works, but then prompted Opus 4.7 to do the translation from paper to code with the Go implementation as reference because that work exceeded my abilities. I focused on verifying the implementation through the existing tests and expanded them to catch new (and previously uncovered) edge-cases, identified by both me and Opus. I also focused on benchmarking different variants of this implementation and alternatives like e.g. the I'd like to close this PR for now. I do not feel good proposing a new algorithm if I myself don't understand it. I already had very mixed feelings about this before I opened the PR. For documentation purposes, I updated the PR with your suggestions and fixed one bug that GPT-5.5 identified. I also updated the function comment extensively to reflect that this is in fact not a direct implementation of Cox 2026, but merely a refactor of the existing exact rational scaling approach based on David M. Gay. I hope that this PR can serve as inspiration for someone smarter than me who actually understands this stuff to improve the rounding algorithm in the future 🙏 |
Float.round/2 algorithm with Cox 2026Float.round/2 algorithm for better performance
|
Hi @PJUllrich, I have reviewed the pull request and it overall looks good to me, so if you don't mind, you can reopen it and I will be responsible for merging it. |
|
I don't mind as long as you don't blame me at the next ElixirConf for breaking Elixir 1.20 :D |
|
💚 💙 💜 💛 ❤️ |
I would like to tentatively put forward this PR for review. I am by no means an expert on floating-point arithmetic and this contribution is significantly outside my expertise. My hope is that this PR can open the discussion about how to replace the inefficient
Float.round/2algorithm with something more performant. Feel free to reject if useless.Summary
Refactors the bignum-heavy
Float.round/2(and the shared internalround/3used byFloat.floor/2andFloat.ceil/2)Result: ~6-7× faster across the precision range, with zero behavior change - including the documented tie cases like
Float.round(5.5675, 3) == 5.567.Background
The previous implementation explicitly noted the trade-off:
It computed
m * power_of_5(count)wherecountcould grow to ~104, producing 250-bit bignums for a single division. The cost grew with both float magnitude and target precision.The current code contains a lot of comments to make the review easier. I can remove those before the final merge.
Benchmark results
11 different float workloads, measured with Benchee on Apple M2. Reported times are the median μs of one iteration over all 11 workloads.
Median μs per iteration over the 11-float mixed workload (per-call median ≈ value / 11).
currentis the bignum-based implementation we're replacingnewis this PRnative_doubleis included as a "what does this look like in pure native float arithmetic" reference (:erlang.round(f * pow) / pow) - it's 2-4× faster than New but incorrect on tie inputs because5.5675 * 1000rounds to5567.5in IEEE due to double-rounding (multiplication round + integer round), then bumps to5568. It's why we can't simply use the obvious approach.Floats benchmarked