Update THD sink attention logic for cudnn >=9.18.0#2561
Update THD sink attention logic for cudnn >=9.18.0#2561cuichenx wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
THD Sink attention is supported in 9.18.0 Signed-off-by: Chen Cui <chcui@nvidia.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR updates the attention backend selection logic to enable FusedAttention for THD (Token-Head-Dimension) format with non-vanilla softmax types when using cuDNN 9.18.0 or later. Previously, FusedAttention was unconditionally disabled for all THD formats with non-vanilla softmax. The change adds a version check that only disables FusedAttention for cuDNN versions below 9.18.0, allowing modern cuDNN versions to leverage the newly supported sink attention feature in THD format. The change is minimal and focused: it wraps the FusedAttention disabling logic in a version check, while keeping UnfusedDotProductAttention disabled for all versions to maintain backward compatibility with older cuDNN versions that lack this feature. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant get_attention_backend as get_attention_backend()
participant version_check as cudnn_version check
participant fused_attn as FusedAttention
get_attention_backend->>version_check: Check if softmax_type != "vanilla"
version_check-->>get_attention_backend: True
get_attention_backend->>version_check: Check if qkv_format == "thd"
version_check-->>get_attention_backend: True
rect rgb(200, 220, 255)
Note over version_check: NEW: Version gate added
get_attention_backend->>version_check: Check if cudnn_version < (9, 18, 0)
end
alt cuDNN < 9.18.0
version_check-->>fused_attn: Disable FusedAttention (legacy behavior)
else cuDNN >= 9.18.0
version_check-->>fused_attn: Allow FusedAttention (new feature support)
end
|
Greptile found no issues!From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
Could you please add a THD test here: https://github.com/cuichenx/TransformerEngine/blob/442699c714c3e25d1797712319e32f4d569a98e5/tests/pytorch/attention/test_attention.py#L418 Thanks! |
|
accidentally closed this after renaming the branch.. opened new PR here #2568 |
THD Sink attention is supported in 9.18.0
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: