fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version by Marius-Graml · Pull Request #578 · PrunaAI/pruna

Marius-Graml · 2026-03-16T10:44:53Z

Description

Currently, there is a bug in the sageattn algorithm. Diffusers has two set_attention_backend methods, one for the whole model and one for the submodules. The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.
Further, the pinned version of the fa3 kernel is removed such that fa3 works for torch 2.10 now. Note, that some kernel builds return (out, lse), others return just out, depending on torch and cuda version. Thus, this must be handled in the registered torch-op function.

Related Issue

/

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Run in notebook

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

/

The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.

Bug fix: pre-download sage_attention kernel before applying backend

2aa0b5d

The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.

Marius-Graml changed the title ~~Bug fix: pre-download sage_attention kernel before applying backend~~ fix: pre-download sage_attention kernel before applying backend Mar 16, 2026

Marius-Graml requested a review from johannaSommer March 16, 2026 12:53

Remove pinned fa3 kernel version.

b5272b1

Marius-Graml changed the title ~~fix: pre-download sage_attention kernel before applying backend~~ fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version Mar 18, 2026

Marius-Graml requested a review from begumcig March 18, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version#578

fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version#578
Marius-Graml wants to merge 2 commits intomainfrom
fix/SageAttn

Marius-Graml commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Marius-Graml commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Marius-Graml commented Mar 16, 2026 •

edited

Loading