Skip to content

Conversation

@jammychiou1
Copy link
Contributor

@jammychiou1 jammychiou1 commented Dec 1, 2025

Along the way, we also fix some minor details in comments for AVX2 decompose:

  • The floor() in floor((f + 127) >> 7) was somewhat unnecessary as the usual semantic for the right-shift operator >> has integer output anyway. Seeing as the right-shift operator is not used in other explanation comments, we decided to rewrite it as division by 2^7 for better consistency.
  • The bound of f1'' is correct but the proof was misleading. The new proof should be clearer.

@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 0b47c08 to 9931203 Compare December 1, 2025 03:34
@jammychiou1
Copy link
Contributor Author

I did consider addressing #654 also in this PR, but I haven't come up with a great idea for that yet...

@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 0be3a84 to 326d731 Compare December 1, 2025 07:21
Comment on lines 76 to 85
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
* 1 / 4092.
* 1 / 4092, so (for example) f1' = B / 2 = 2046 is mapped to
*
* round(2046 * 1025 / 2^22) = round(2046 * (1 / 4092 - epsilon))
* = round(1 / 2 - epsilon') = 0,
*
* where epsilon = 1 / 4092 - 1025 / 2^22 and epsilon' = 2046 * eps are both
* tiny but positive numbers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-becker May I have your review on this explanation? If this looks good, I plan to also apply this to the aarch64 implementation and resolve #654. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jammychiou1 Apologies for the slow reply.

I think if we want to fix #654 we should provide a general explanation.

Here's an attempt:

The approximation error for `f1' / B ≈ f1' * 1025 / 2^22` is `f1' * (1025/2^22 - 1/B)`. 
For `eps := 1025/2^22 - 1/B` we have `eps = 1/4290772992 ~ 2^{-31.99} < 2^{-31}`. 
Hence `|f1'| * eps < 2^{-15}`. Given `B = 4092 ~ 2^12`, we thus have `|f1'| * eps < 1/B`. 
On the other hand, `1/B` is the spacing between the integral multiples of `1/B`, which
includes all rounding boundaries `n + 1/2` (since `B` is even). Hence, if `f1' / B` is not 
of the form `n + 1/2`, then moving from `f1' / B` to `f1' * 1025 / 2^22` does not cross 
a rounding boundary, and hence `round(f1' / B) = round(f1' * 1025 / 2^22)`, and it doesn't
matter on either side which version of rounding one uses. 
If `f1' / B` _is_ of the form `n + 1/2`, then `f1' * 1025 / 2^22` is slightly below it 
(and _not_ a multiple of `n + 1/2`), hence `round-(f1' / B) = round(f1' * 1025 / 2^22)`; 
where the round-down on the LHS is essential, and on the RHS the type of rounding 
again does not matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-becker I'm very sorry for the long delay. I wanted to work on this, but I had to prioritize other tasks in the last few weeks...

Thank you for the detailed suggestion. This explanation is very clear. I created a new patch that incorporates the explanation with some modifications: I reworded it slightly according to the context and added some details that are unobvious to me at first. Please take a look at your convenience. Thank you very much and happy holidays!

Comment on lines 79 to 86
* 1 / 1488.
* 1 / 1488, so (for example) f1' = B / 2 = 744 is mapped to
*
* round(744 * 11275 / 2^24) = round(744 * (1 / 1488 - epsilon))
* = round(1 / 2 - epsilon') = 0,
*
* where epsilon = 1 / 1488 - 11275 / 2^24 and epsilon' = 744 * eps are both
* tiny but positive numbers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 326d731 to 8a3644e Compare December 26, 2025 09:47
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jammychiou1, please adjust the commit message so that it does not include [DRAFT] and mark this PR ready for review once you think it is ready. That way we can merge it once @hanno-becker finds the time to look at it.

- The floor() in floor((f + 127) >> 7) was somewhat unecessary as
  the usual semantic for the right-shift operator (>>) has integer
  output anyway. Seeing as the right-shift operator is not used in other
  explanation comments, we decided to rewrite it as division by 2^7 for
  better consistency.
- The bound of f1'' is correct but the proof was misleading. The new
  proof should be clearer.

Signed-off-by: jammychiou1 <[email protected]>
Based on Hanno Becker's proposal, the new explanation explains how
round-(f1' / B) can be replaced with rounding-mulhi, regardless of the
type of rounding used in the mulhi. In addition, by bounding the
approximation error to be strictly less than 1 / B, the exactness of the
Barrett division is also justified.

To avoid excessive repetition, we prove the GAMMA2 = (Q-1)/88 case in
the C implementation, remark how the same proof can be adapted to the
GAMMA2 = (Q-1)/32 case, and finally refer to them when explaining the
AVX2 implementation.

Signed-off-by: jammychiou1 <[email protected]>
Adapt the new explanation to the NEON implementation.

Signed-off-by: jammychiou1 <[email protected]>
@jammychiou1 jammychiou1 force-pushed the decompose-explanation-fix branch from 8a3644e to f520f98 Compare December 29, 2025 03:44
@jammychiou1
Copy link
Contributor Author

Thanks for the idea! I went ahead and copied/adapted the new explanation to the C and NEON implementation.

The C and AVX2 ones use the same denominators B, so I actually moved the explanation to the C implementations and refer to them in the AVX2 ones.

@jammychiou1 jammychiou1 changed the title Fix some minor details in comments for AVX2 decompose Better explanation for Barrett division in decompose Dec 29, 2025
@jammychiou1 jammychiou1 marked this pull request as ready for review December 29, 2025 04:09
@jammychiou1 jammychiou1 requested a review from a team as a code owner December 29, 2025 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Come up with a better explanation for round(-half-up) vs round-half-down in decompose

4 participants