Better explanation for Barrett division in decompose #762

jammychiou1 · 2025-12-01T03:31:41Z

Resolves Come up with a better explanation for round(-half-up) vs round-half-down in decompose #654

Along the way, we also fix some minor details in comments for AVX2 decompose:

The floor() in floor((f + 127) >> 7) was somewhat unnecessary as the usual semantic for the right-shift operator >> has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency.
The bound of f1'' is correct but the proof was misleading. The new proof should be clearer.

jammychiou1 · 2025-12-01T03:40:26Z

I did consider addressing #654 also in this PR, but I haven't come up with a great idea for that yet...

jammychiou1 · 2025-12-01T07:32:42Z

dev/x86_64/src/poly_decompose_32_avx2.c

     * Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
     * for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
-     * 1 / 4092.
+     * 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to
+     *
+     *   round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon))
+     *                             = round(1 / 2 - epsilon') = 0,
+     *
+     * where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both
+     * tiny but positive numbers.


@hanno-becker May I have your review on this explanation? If this looks good, I plan to also apply this to the aarch64 implementation and resolve #654. Thank you!

@jammychiou1 Apologies for the slow reply.

I think if we want to fix #654 we should provide a general explanation.

Here's an attempt:

The approximation error for `f1' / B ≈ f1' * 1025 / 2^22` is `f1' * (1025/2^22 - 1/B)`. For `eps := 1025/2^22 - 1/B` we have `eps = 1/4290772992 ~ 2^{-31.99} < 2^{-31}`. Hence `|f1'| * eps < 2^{-15}`. Given `B = 4092 ~ 2^12`, we thus have `|f1'| * eps < 1/B`. On the other hand, `1/B` is the spacing between the integral multiples of `1/B`, which includes all rounding boundaries `n + 1/2` (since `B` is even). Hence, if `f1' / B` is not of the form `n + 1/2`, then moving from `f1' / B` to `f1' * 1025 / 2^22` does not cross a rounding boundary, and hence `round(f1' / B) = round(f1' * 1025 / 2^22)`, and it doesn't matter on either side which version of rounding one uses. If `f1' / B` _is_ of the form `n + 1/2`, then `f1' * 1025 / 2^22` is slightly below it (and _not_ a multiple of `n + 1/2`), hence `round-(f1' / B) = round(f1' * 1025 / 2^22)`; where the round-down on the LHS is essential, and on the RHS the type of rounding again does not matter.

@hanno-becker I'm very sorry for the long delay. I wanted to work on this, but I had to prioritize other tasks in the last few weeks...

Thank you for the detailed suggestion. This explanation is very clear. I created a new patch that incorporates the explanation with some modifications: I reworded it slightly according to the context and added some details that are unobvious to me at first. Please take a look at your convenience. Thank you very much and happy holidays!

jammychiou1 · 2025-12-01T07:33:05Z

dev/x86_64/src/poly_decompose_88_avx2.c

-     * 1 / 1488.
+     * 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to
+     *
+     *   round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon))
+     *                             = round(1 / 2 - epsilon') = 0,
+     *
+     * where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both
+     * tiny but positive numbers.


mkannwischer

@jammychiou1, please adjust the commit message so that it does not include [DRAFT] and mark this PR ready for review once you think it is ready. That way we can merge it once @hanno-becker finds the time to look at it.

- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency. - The bound of f1'' is correct but the proof was misleading. The new proof should be clearer. Signed-off-by: jammychiou1 <[email protected]>

Based on Hanno Becker's proposal, the new explanation explains how round-(f1' / B) can be replaced with rounding-mulhi, regardless of the type of rounding used in the mulhi. In addition, by bounding the approximation error to be strictly less than 1 / B, the exactness of the Barrett division is also justified. To avoid excessive repetition, we prove the GAMMA2 = (Q-1)/88 case in the C implementation, remark how the same proof can be adapted to the GAMMA2 = (Q-1)/32 case, and finally refer to them when explaining the AVX2 implementation. Signed-off-by: jammychiou1 <[email protected]>

Adapt the new explanation to the NEON implementation. Signed-off-by: jammychiou1 <[email protected]>

jammychiou1 · 2025-12-29T04:01:48Z

Thanks for the idea! I went ahead and copied/adapted the new explanation to the C and NEON implementation.

The C and AVX2 ones use the same denominators B, so I actually moved the explanation to the C implementations and refer to them in the AVX2 ones.

jammychiou1 force-pushed the decompose-explanation-fix branch from 0b47c08 to 9931203 Compare December 1, 2025 03:34

jammychiou1 force-pushed the decompose-explanation-fix branch from 0be3a84 to 326d731 Compare December 1, 2025 07:21

jammychiou1 commented Dec 1, 2025

View reviewed changes

jammychiou1 force-pushed the decompose-explanation-fix branch from 326d731 to 8a3644e Compare December 26, 2025 09:47

mkannwischer requested changes Dec 27, 2025

View reviewed changes

jammychiou1 added 3 commits December 29, 2025 10:59

Better explanation for Barrett division in decompose (NEON)

f520f98

Adapt the new explanation to the NEON implementation. Signed-off-by: jammychiou1 <[email protected]>

jammychiou1 force-pushed the decompose-explanation-fix branch from 8a3644e to f520f98 Compare December 29, 2025 03:44

jammychiou1 changed the title ~~Fix some minor details in comments for AVX2 decompose~~ Better explanation for Barrett division in decompose Dec 29, 2025

jammychiou1 marked this pull request as ready for review December 29, 2025 04:09

jammychiou1 requested a review from a team as a code owner December 29, 2025 04:10

mkannwischer assigned hanno-becker Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better explanation for Barrett division in decompose #762

Better explanation for Barrett division in decompose #762

Uh oh!

jammychiou1 commented Dec 1, 2025 •

edited

Loading

Uh oh!

jammychiou1 commented Dec 1, 2025

Uh oh!

jammychiou1 Dec 1, 2025

Uh oh!

hanno-becker Dec 4, 2025

Uh oh!

jammychiou1 Dec 26, 2025

Uh oh!

jammychiou1 Dec 1, 2025

Uh oh!

mkannwischer left a comment

Uh oh!

jammychiou1 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Better explanation for Barrett division in decompose #762

Are you sure you want to change the base?

Better explanation for Barrett division in decompose #762

Uh oh!

Conversation

jammychiou1 commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jammychiou1 commented Dec 1, 2025

Uh oh!

jammychiou1 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

hanno-becker Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jammychiou1 Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

jammychiou1 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

mkannwischer left a comment

Choose a reason for hiding this comment

Uh oh!

jammychiou1 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jammychiou1 commented Dec 1, 2025 •

edited

Loading