fix(patch): preserve native matmul backend on torch 2.9#66
Conversation
Keep torch.backends.cuda.matmul as the native PyTorch backend object while forwarding its attribute access to MUSA matmul semantics on MUSA platforms.
|
@popsiclexu Could you please take a look? Thanks! |
🤖 Augment PR SummarySummary: Fixes a PyTorch 2.9 + Changes:
🤖 Was this summary useful? React with 👍 or 👎 |
| try: | ||
| _ = cuda_matmul.fp32_precision | ||
| has_native_fp32_precision = True | ||
| except AttributeError: |
There was a problem hiding this comment.
src/torchada/_patch.py:1182: the probe cuda_matmul.fp32_precision may raise AssertionError (not just AttributeError) for unknown attributes (as noted in the tests), which would make _patch_backends_cuda() crash during import on some versions; consider catching AssertionError here as well.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
|
||
| import torchada | ||
|
|
||
| if not torchada.is_musa_platform() or not hasattr(torch.backends, "musa"): |
There was a problem hiding this comment.
tests/test_cuda_patching.py:1251: this skip guard checks hasattr(torch.backends, "musa") but the test immediately dereferences torch.backends.musa.matmul; if musa exists without matmul, this will error instead of skipping.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
Could you run the full test suite on Torch 2.7? |
# pytest -v ./tests/
================================================================ 310 passed, 30 skipped in 1.25s ================================================================
# pip list | grep torch
torch 2.7.1
torch_musa 2.7.1
torchada 0.1.22 /ws
torchaudio 2.7.1a0+95c61b4
torchvision 0.22.1+6b25dcc
root@xiaodongye-s80:/ws# |
Summary
Fix a PyTorch 2.9 +
torch.compilecompatibility issue on MUSA.torchada==0.1.55replaces PyTorch's native CUDA matmul backend object with the MUSA matmul backend object:That makes CUDA-style matmul settings point to MUSA, but it also changes the object seen by PyTorch internals.
In PyTorch 2.9,
torch.compile/ Inductor expectstorch.backends.cuda.matmulto remain PyTorch's native backend object and reads native attributes from it. After the replacement,torch.backends.cuda.matmulbecomestorch_musa.core.musa.muBLASModule, which does not expose all PyTorch 2.9 native matmul attributes.This can break SGLang service startup when
torch.compile/ Inductor is enabled.Reproduced issue
With
torchada==0.1.55:A minimal Inductor compile fails with:
Relevant traceback:
Root cause
Inductor expects this attribute to exist on PyTorch's native
torch.backends.cuda.matmulobject:But after the replacement,
torch.backends.cuda.matmulpoints to:whose object type is:
and that object does not provide the PyTorch 2.9 native matmul backend API.
Fix
Keep the native PyTorch object:
and patch attribute access on MUSA platforms so CUDA-style code still gets MUSA matmul semantics where appropriate.
After this change:
but CUDA-style settings still forward to MUSA, for example:
affects:
on MUSA.
At the same time, PyTorch 2.9 / Inductor can still access native attributes such as:
Validation
Minimal
torch.compilereproductionAfter installing the fixed editable torchada:
A minimal Inductor compile now succeeds:
Targeted backend patch tests
Full torchada test suite
SGLang startup validation
The SGLang service starts successfully after the fix: