-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Dear Authors,
I recently read your paper on MultipoleAttention and noticed an overlap with RetroInfer (https://arxiv.org/pdf/2505.02922). Specifically, the core techniques introduced in MultipoleAttention—namely (1) Multipole Approximation and (2) Block-wise k-means—appear to be identical to the (1) Accuracy-bounded Attention Estimation and (2) Segmented k-means presented in RetroInfer. The primary distinction seems to be that MultipoleAttention is tailored for the GPU-only setting, whereas RetroInfer is for GPU-CPU.
Considering that RetroInfer was released earlier, it would be appropriate for MultipoleAttention to cite RetroInfer. Additionally, I would appreciate it if you could clarify any further differences between MultipoleAttention and RetroInfer. Thank you.