-
Notifications
You must be signed in to change notification settings - Fork 29
Better explanation for Barrett division in decompose #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0b47c08 to
9931203
Compare
|
I did consider addressing #654 also in this PR, but I haven't come up with a great idea for that yet... |
0be3a84 to
326d731
Compare
| * Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact | ||
| * for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲ | ||
| * 1 / 4092. | ||
| * 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to | ||
| * | ||
| * round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon)) | ||
| * = round(1 / 2 - epsilon') = 0, | ||
| * | ||
| * where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both | ||
| * tiny but positive numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-becker May I have your review on this explanation? If this looks good, I plan to also apply this to the aarch64 implementation and resolve #654. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jammychiou1 Apologies for the slow reply.
I think if we want to fix #654 we should provide a general explanation.
Here's an attempt:
The approximation error for `f1' / B ≈ f1' * 1025 / 2^22` is `f1' * (1025/2^22 - 1/B)`.
For `eps := 1025/2^22 - 1/B` we have `eps = 1/4290772992 ~ 2^{-31.99} < 2^{-31}`.
Hence `|f1'| * eps < 2^{-15}`. Given `B = 4092 ~ 2^12`, we thus have `|f1'| * eps < 1/B`.
On the other hand, `1/B` is the spacing between the integral multiples of `1/B`, which
includes all rounding boundaries `n + 1/2` (since `B` is even). Hence, if `f1' / B` is not
of the form `n + 1/2`, then moving from `f1' / B` to `f1' * 1025 / 2^22` does not cross
a rounding boundary, and hence `round(f1' / B) = round(f1' * 1025 / 2^22)`, and it doesn't
matter on either side which version of rounding one uses.
If `f1' / B` _is_ of the form `n + 1/2`, then `f1' * 1025 / 2^22` is slightly below it
(and _not_ a multiple of `n + 1/2`), hence `round-(f1' / B) = round(f1' * 1025 / 2^22)`;
where the round-down on the LHS is essential, and on the RHS the type of rounding
again does not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-becker I'm very sorry for the long delay. I wanted to work on this, but I had to prioritize other tasks in the last few weeks...
Thank you for the detailed suggestion. This explanation is very clear. I created a new patch that incorporates the explanation with some modifications: I reworded it slightly according to the context and added some details that are unobvious to me at first. Please take a look at your convenience. Thank you very much and happy holidays!
| * 1 / 1488. | ||
| * 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to | ||
| * | ||
| * round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon)) | ||
| * = round(1 / 2 - epsilon') = 0, | ||
| * | ||
| * where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both | ||
| * tiny but positive numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
326d731 to
8a3644e
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jammychiou1, please adjust the commit message so that it does not include [DRAFT] and mark this PR ready for review once you think it is ready. That way we can merge it once @hanno-becker finds the time to look at it.
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency. - The bound of f1'' is correct but the proof was misleading. The new proof should be clearer. Signed-off-by: jammychiou1 <[email protected]>
Based on Hanno Becker's proposal, the new explanation explains how round-(f1' / B) can be replaced with rounding-mulhi, regardless of the type of rounding used in the mulhi. In addition, by bounding the approximation error to be strictly less than 1 / B, the exactness of the Barrett division is also justified. To avoid excessive repetition, we prove the GAMMA2 = (Q-1)/88 case in the C implementation, remark how the same proof can be adapted to the GAMMA2 = (Q-1)/32 case, and finally refer to them when explaining the AVX2 implementation. Signed-off-by: jammychiou1 <[email protected]>
Adapt the new explanation to the NEON implementation. Signed-off-by: jammychiou1 <[email protected]>
8a3644e to
f520f98
Compare
|
Thanks for the idea! I went ahead and copied/adapted the new explanation to the C and NEON implementation. The C and AVX2 ones use the same denominators |
Along the way, we also fix some minor details in comments for AVX2 decompose:
floor()infloor((f + 127) >> 7)was somewhat unnecessary as the usual semantic for the right-shift operator>>has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by2^7for better consistency.f1''is correct but the proof was misleading. The new proof should be clearer.