[slimtensor] integration into backend #16565

Gasoonjia · 2026-01-13T19:07:36Z

This diff makes cuda backend actually use slimtensor.
It:
updates cuda_backends to create slimtensor from given etensor
removed duplicate etensor-driven shim layers under cuda_backend
update cmake logic in both cuda backend and aoti backend
Perf maintains the same. Shows as before.

Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work.
When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one.

Stack from ghstack (oldest at bottom):

-> [slimtensor] integration into backend #16565

Differential Revision: D90606409

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

pytorch-bot · 2026-01-13T19:07:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16565

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 17 New Failures, 2 Unrelated Failures

As of commit 7d6a571 with merge base a093fe4 ():

NEW FAILURES - The following jobs have failed:

cuda-perf / benchmark-cuda (openai/whisper-large-v3-turbo, non-quantized, openai_whisper-large-v3-turbo, 50) / linux-job (gh)
RuntimeError: Command docker exec -t aaee31ae535126db5566e14b40d93dd8070aad078483083f70d244b9bf8c29eb /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-large-v3-turbo, quantized-int4-tile-packed, openai_whisper-large-v... / linux-job (gh)
RuntimeError: Command docker exec -t a9c6bb8a9c05095dd93501f1a0ed2798d7dcf38e2cba0be48408224768b60be1 /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-large-v3-turbo, quantized-int4-weight-only, openai_whisper-large-v... / linux-job (gh)
RuntimeError: Command docker exec -t 23f3d5a0a5b9058e8ca0804f74098b0a9839029bd3b82f38b2881a7aaf7c5325 /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-medium, non-quantized, openai_whisper-medium, 50) / linux-job (gh)
RuntimeError: Command docker exec -t 91d91185b4016301660a3227044805f020bb7847e1cf1ee6630b15649a5f29b3 /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-medium, quantized-int4-tile-packed, openai_whisper-medium, 50) / linux-job (gh)
RuntimeError: Command docker exec -t 5ad53ec8fa745f1453c2b67fd1724c6d8727d27c8f8a2dd59066b09207c4bccc /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-medium, quantized-int4-weight-only, openai_whisper-medium, 50) / linux-job (gh)
RuntimeError: Command docker exec -t adced3124be522e582bedf6d15684b4adb08033bf15acd9d250479c30d17d363 /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-small, non-quantized, openai_whisper-small, 50) / linux-job (gh)
RuntimeError: Command docker exec -t 11f2c328ad3ce422e83378a0072967bfd75f3512934da5257f91fc38c842fbfd /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-small, quantized-int4-tile-packed, openai_whisper-small, 50) / linux-job (gh)
RuntimeError: Command docker exec -t 83d207ec1da3dcad22028eead2d0c3fafa81afca2756809b7372893b902f3714 /exec failed with exit code 1
cuda-perf / benchmark-cuda (openai/whisper-small, quantized-int4-weight-only, openai_whisper-small, 50) / linux-job (gh)
RuntimeError: Command docker exec -t 07c674864a847fa0e99963bf6ddfe88f80f2c1d614b9fad65f1c7b7044b22f87 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 56352a7fe410eea399dd72ec0bcd1da3f195d02b84079eda095c610241ae7a07 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, non-quantized) / linux-job (gh)
RuntimeError: Command docker exec -t fb34b18f6b8a4f90a389f6008708da3da78d546dd289f4d062931394656faf74 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 95956462471a93e245253f6c30a1ba7b9174a4df411cc970e150fcde5aaa1693 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t b788461ef63a32390a26d59d0a5c382193ccd1c2b0578174d06630752d1aec34 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, non-quantized) / linux-job (gh)
RuntimeError: Command docker exec -t ffb312c13dd24442907d06924abde9e3640af173339887a7f4364586d0d73e6b /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 7d2d52fa599e4697c75c0898383d2c43fb77bbd410c91409d394044d083e2148 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t ca85530c4d645b2455461daa2956049042840c45f774bd071d40d1931fcf0fb2 /exec failed with exit code 1
Test CUDA Builds / unittest-cuda / linux-job (gh)
RuntimeError: Command docker exec -t f145b4a39dae586d0e70255ee33b4396644075aa092481d436beae24cf06be27 /exec failed with exit code 2

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Test Metal Backend / test-model-metal-e2e (openai, whisper-large-v3-turbo, non-quantized) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
Test Metal Backend / test-model-metal-e2e (openai, whisper-small, non-quantized) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) ghstack-source-id: 333239044 Pull Request resolved: #16565

Pull Request resolved: #16565 ghstack-source-id: 335005909 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 ghstack-source-id: 335280573 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 ghstack-source-id: 335418194 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 perf maintains as before. {F1984962152} ghstack-source-id: 336200461 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 perf maintains as before. {F1984962152} ghstack-source-id: 336233120 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * #16447 * #16446 * __->__ #16724 Copy CUDAGuard and CUDAStreamGuard from cuda/runtime/ to aoti/slim/cuda/ to support slimtensor requirement while get rid of potential circular dependency: - cuda_backend/main_functionalities -> aoti/slimtensor -> cuda_backend/cuda_guard This change: - copy guard.h, guard.cpp and test files from backend/cuda_backend to backend/aoti/slim/cuda/ Differential Revision: [D91056808](https://our.internmc.facebook.com/intern/diff/D91056808/)

…v2 (#16446) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * #16447 * __->__ #16446 * #16724 Add SlimTensor-based implementations of AOTI shim functions for tensor creation: 1. `aoti_torch_create_tensor_from_blob_v2()` - Creates a non-owning SlimTensor that wraps existing memory using the `from_blob()` factory Both functions support CPU and CUDA devices and handle all 7 SlimTensor dtypes. Also add `memory_slim.h` and `memory_slim.cpp` with SlimTensor-based shim implementations for working on new API while not impact the current pipeline. Will use memory_slim.{h/cpp} to replace current memory.{h/cpp} when everything has been set up. Differential Revision: [D90126247](https://our.internmc.facebook.com/intern/diff/D90126247/)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * __->__ #16447 * #16446 * #16724 Add SlimTensor-based implementations of AOTI shim functions for tensor creation: `aoti_torch_create_tensor_from_blob_v2()` - Creates a non-owning SlimTensor that wraps existing memory using the `from_blob()` factory Both functions support CPU and CUDA devices and handle all 7 SlimTensor dtypes. Changes: - Add `memory_slim.h` and `memory_slim.cpp` with SlimTensor-based shim implementations - Add `runtime_shims_slim` library target to TARGETS with `CUDA_AVAILABLE=1` preprocessor flag - Add `cuda_shim_slim_cpp_unittest()` function for SlimTensor test targets Differential Revision: [D90126244](https://our.internmc.facebook.com/intern/diff/D90126244/)

larryliu0820

Review automatically exported from Phabricator review in Meta.

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336538676 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336658381 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336675369 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

[slimtensor] integration into backend

0e0d0a0

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Gasoonjia requested review from kirklandsign and larryliu0820 as code owners January 13, 2026 19:07

This was referenced Jan 13, 2026

[slimtensor] Add factory functions for creating empty CPU tensors #16398

Merged

[slimtensor] Add all required dtype support (Int8/16/32/64, Bool, BFloat16) #16399

Merged

Gasoonjia mentioned this pull request Jan 13, 2026

[slimtensor] Add CUDA DeviceType and extend Device class #16437

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2026

Gasoonjia added a commit that referenced this pull request Jan 13, 2026

[slimtensor] integration into backend

c7ebdda

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) ghstack-source-id: 333239044 Pull Request resolved: #16565

Update on "[slimtensor] integration into backend"

86d7e43

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Update on "[slimtensor] integration into backend"

8c32492

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Gasoonjia added 4 commits January 26, 2026 11:24

parakeet works

c225e32

parakeet works

030a931

whisper works

4fa4dfc

parakeet works - 2

5e9f654

Gasoonjia temporarily deployed to upload-benchmark-results January 27, 2026 08:01 — with GitHub Actions Inactive

Gasoonjia temporarily deployed to upload-benchmark-results January 27, 2026 17:18 — with GitHub Actions Inactive

Gasoonjia added 3 commits January 27, 2026 10:40

remove nonnecessary debug info

512a3e4

polish cuda backend.cpp comment

18afded

Update on "[slimtensor] integration into backend"

75287c4

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Update on "[slimtensor] integration into backend"

ff05337

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

larryliu0820 approved these changes Jan 28, 2026

View reviewed changes

Gasoonjia temporarily deployed to upload-benchmark-results January 29, 2026 09:48 — with GitHub Actions Inactive

Gasoonjia mentioned this pull request Jan 29, 2026

use slimtensor in cuda backend as internal tensor representation #16280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[slimtensor] integration into backend #16565

[slimtensor] integration into backend #16565

Gasoonjia commented Jan 13, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

larryliu0820 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[slimtensor] integration into backend #16565

Are you sure you want to change the base?

[slimtensor] integration into backend #16565

Conversation

Gasoonjia commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16565

❌ 17 New Failures, 2 Unrelated Failures

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gasoonjia commented Jan 13, 2026 •

edited

Loading

pytorch-bot bot commented Jan 13, 2026 •

edited

Loading