Skip to content

test: add KV cache subgraph tests with dynamic shapes and generation …#4191

Open
yizhuoz004 wants to merge 1 commit intopytorch:mainfrom
yizhuoz004:hlo-kv-cache-tests
Open

test: add KV cache subgraph tests with dynamic shapes and generation …#4191
yizhuoz004 wants to merge 1 commit intopytorch:mainfrom
yizhuoz004:hlo-kv-cache-tests

Conversation

@yizhuoz004
Copy link
Copy Markdown
Contributor

Description

Add tests/py/dynamo/hlo/test_kv_cache.py covering five KV cache patterns common in LLM inference:

  • DynamicCache: growing cache via torch.cat
  • StaticCache: fixed-size cache with index_copy_ writes
  • StaticScatterCache: scatter-based position-indexed writes
  • SlidingWindowCache: fixed-window rolling cache via cat+slice
  • RoPEDynamicCache: dynamic cache with rotary position embeddings

Each test class uses torch.export with dynamic shape dims and torch_tensorrt.dynamo.compile, runs a multi-step generation loop (8–20 steps), and validates TRT output against a PyTorch reference across 6 configurations (batch size, head count, hidden dim, fp16/fp32).

Fixes # (issue)

Type of change

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

…loops

Add tests/py/dynamo/hlo/test_kv_cache.py covering five KV cache patterns
common in LLM inference:

- DynamicCache: growing cache via torch.cat
- StaticCache: fixed-size cache with index_copy_ writes
- StaticScatterCache: scatter-based position-indexed writes
- SlidingWindowCache: fixed-window rolling cache via cat+slice
- RoPEDynamicCache: dynamic cache with rotary position embeddings

Each test class uses torch.export with dynamic shape dims and
torch_tensorrt.dynamo.compile, runs a multi-step generation loop
(8–20 steps), and validates TRT output against a PyTorch reference
across 6 configurations (batch size, head count, hidden dim, fp16/fp32).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@meta-cla
Copy link
Copy Markdown

meta-cla bot commented Apr 15, 2026

Hi @yizhuoz004!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@github-actions github-actions bot added the component: tests Issues re: Tests label Apr 15, 2026
@narendasan narendasan requested a review from zewenli98 April 16, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant