feat: add dynamic shapes kernel specialization strategy for TRT-RTX by tp5uiuc · Pull Request #4184 · pytorch/TensorRT

tp5uiuc · 2026-04-12T20:48:04Z

Description

Expose IRuntimeConfig.setDynamicShapesKernelSpecializationStrategy() through the Torch-TensorRT Python API for TensorRT-RTX builds.

Users can now control how shape-specialized kernels are compiled at runtime for dynamic shapes via the new dynamic_shapes_kernel_specialization_strategy compilation setting:

"lazy" (default): Compile shape-specialized kernels in the background, use fallback until ready
"eager": Compile immediately (blocking)
"none": Always use fallback kernels, never specialize

Depends on: #4180 (runtime cache API — provides the IRuntimeConfig infrastructure)

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Expose IRuntimeConfig.setDynamicShapesKernelSpecializationStrategy() through the Torch-TensorRT Python API. Users can now control how shape-specialized kernels are compiled at runtime for dynamic shapes on TensorRT-RTX via the new `dynamic_shapes_kernel_specialization_strategy` compilation setting ("lazy", "eager", or "none"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address review feedback: compile with torchtrt.Input min/opt/max ranges so dynamic shapes are actually exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lanluo-nvidia

lgtm, one minor comment.

lanluo-nvidia · 2026-04-20T16:47:04Z

        hardware_compatible (bool): Build the TensorRT engines compatible with GPU architectures other than that of the GPU on which the engine was built (currently works for NVIDIA Ampere and newer)
        timing_cache_path (str): Path to the timing cache if it exists (or) where it will be saved after compilation. Not used for TensorRT-RTX.
        runtime_cache_path (str): Path to the runtime cache for TensorRT-RTX JIT compilation results. Not used for standard TensorRT.
+        dynamic_shapes_kernel_specialization_strategy (str): Strategy for dynamic shape kernel specialization at runtime (TensorRT-RTX only). Options: "lazy", "eager", "none". Default: "lazy".


Can we add a warning or check in case user configured dynamic_shapes_kernel_specialization_strategy in TensorRT

This is a good suggestion Lan, I have a followup task to emit user warnings for

timing cache used in TRT-RTX

runtime cache used in standard TRT

dynamic shape strategy used in standard TRT

cudagraphs flag used in standard TRT
so that its easier to review the change/behavior. I will put it in then

meta-cla Bot added the cla signed label Apr 12, 2026

github-actions Bot requested a review from cehongwang April 12, 2026 20:48

tp5uiuc commented Apr 12, 2026

View reviewed changes

Comment thread tests/py/dynamo/models/test_dynamic_shapes_kernel_strategy_models.py

tp5uiuc commented Apr 12, 2026

View reviewed changes

Comment thread tests/py/dynamo/runtime/test_001_dynamic_shapes_kernel_strategy.py

tp5uiuc mentioned this pull request Apr 14, 2026

feat: add TRT-RTX native CUDA graph support #4187

Draft

7 tasks

narendasan added the backend: TensorRT-RTX label Apr 14, 2026

github-actions Bot requested a review from zewenli98 April 14, 2026 17:44

tp5uiuc force-pushed the feat/trtrtx-dynamic-shapes-strategy branch from c222c72 to 385eec6 Compare April 15, 2026 18:54

tp5uiuc and others added 2 commits April 20, 2026 08:58

test: use dynamic shape inputs in kernel strategy tests

d7619ca

Address review feedback: compile with torchtrt.Input min/opt/max ranges so dynamic shapes are actually exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tp5uiuc force-pushed the feat/trtrtx-dynamic-shapes-strategy branch from 385eec6 to d7619ca Compare April 20, 2026 15:58

tp5uiuc marked this pull request as ready for review April 20, 2026 16:09

lanluo-nvidia approved these changes Apr 20, 2026

View reviewed changes

lanluo-nvidia merged commit 8903707 into pytorch:main Apr 21, 2026
84 checks passed

tp5uiuc mentioned this pull request Apr 21, 2026

feat(runtime): add TensorRT-RTX runtime cache, dynamic shapes strategy, and native CUDA graph support to C++ runtime #4202

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dynamic shapes kernel specialization strategy for TRT-RTX#4184

feat: add dynamic shapes kernel specialization strategy for TRT-RTX#4184
lanluo-nvidia merged 2 commits intopytorch:mainfrom
tp5uiuc:feat/trtrtx-dynamic-shapes-strategy

tp5uiuc commented Apr 12, 2026

Uh oh!

Uh oh!

Uh oh!

lanluo-nvidia left a comment

Uh oh!

lanluo-nvidia Apr 20, 2026

Uh oh!

tp5uiuc Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tp5uiuc commented Apr 12, 2026

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

lanluo-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

lanluo-nvidia Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

tp5uiuc Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants