Skip to content

feat: add dynamic shapes kernel specialization strategy for TRT-RTX#4184

Merged
lanluo-nvidia merged 2 commits intopytorch:mainfrom
tp5uiuc:feat/trtrtx-dynamic-shapes-strategy
Apr 21, 2026
Merged

feat: add dynamic shapes kernel specialization strategy for TRT-RTX#4184
lanluo-nvidia merged 2 commits intopytorch:mainfrom
tp5uiuc:feat/trtrtx-dynamic-shapes-strategy

Conversation

@tp5uiuc
Copy link
Copy Markdown
Contributor

@tp5uiuc tp5uiuc commented Apr 12, 2026

Description

Expose IRuntimeConfig.setDynamicShapesKernelSpecializationStrategy() through the Torch-TensorRT Python API for TensorRT-RTX builds.

Users can now control how shape-specialized kernels are compiled at runtime for dynamic shapes via the new dynamic_shapes_kernel_specialization_strategy compilation setting:

  • "lazy" (default): Compile shape-specialized kernels in the background, use fallback until ready
  • "eager": Compile immediately (blocking)
  • "none": Always use fallback kernels, never specialize

Depends on: #4180 (runtime cache API — provides the IRuntimeConfig infrastructure)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla Bot added the cla signed label Apr 12, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: build system Issues re: Build system component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Apr 12, 2026
@github-actions github-actions Bot requested a review from cehongwang April 12, 2026 20:48
Comment thread tests/py/dynamo/runtime/test_001_dynamic_shapes_kernel_strategy.py
@github-actions github-actions Bot requested a review from zewenli98 April 14, 2026 17:44
@tp5uiuc tp5uiuc force-pushed the feat/trtrtx-dynamic-shapes-strategy branch from c222c72 to 385eec6 Compare April 15, 2026 18:54
tp5uiuc and others added 2 commits April 20, 2026 08:58
Expose IRuntimeConfig.setDynamicShapesKernelSpecializationStrategy()
through the Torch-TensorRT Python API. Users can now control how
shape-specialized kernels are compiled at runtime for dynamic shapes
on TensorRT-RTX via the new `dynamic_shapes_kernel_specialization_strategy`
compilation setting ("lazy", "eager", or "none").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review feedback: compile with torchtrt.Input min/opt/max
ranges so dynamic shapes are actually exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tp5uiuc tp5uiuc force-pushed the feat/trtrtx-dynamic-shapes-strategy branch from 385eec6 to d7619ca Compare April 20, 2026 15:58
@tp5uiuc tp5uiuc marked this pull request as ready for review April 20, 2026 16:09
Copy link
Copy Markdown
Collaborator

@lanluo-nvidia lanluo-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one minor comment.

hardware_compatible (bool): Build the TensorRT engines compatible with GPU architectures other than that of the GPU on which the engine was built (currently works for NVIDIA Ampere and newer)
timing_cache_path (str): Path to the timing cache if it exists (or) where it will be saved after compilation. Not used for TensorRT-RTX.
runtime_cache_path (str): Path to the runtime cache for TensorRT-RTX JIT compilation results. Not used for standard TensorRT.
dynamic_shapes_kernel_specialization_strategy (str): Strategy for dynamic shape kernel specialization at runtime (TensorRT-RTX only). Options: "lazy", "eager", "none". Default: "lazy".
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a warning or check in case user configured dynamic_shapes_kernel_specialization_strategy in TensorRT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion Lan, I have a followup task to emit user warnings for

  1. timing cache used in TRT-RTX
  2. runtime cache used in standard TRT
  3. dynamic shape strategy used in standard TRT
  4. cudagraphs flag used in standard TRT
    so that its easier to review the change/behavior. I will put it in then

@lanluo-nvidia lanluo-nvidia merged commit 8903707 into pytorch:main Apr 21, 2026
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend: TensorRT-RTX cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: runtime component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants