NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 653
Star 3.2k

Code
Issues 230
Pull requests 126
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 69 Milestones 0

New pull request New

126 Open 1,918 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Common] Persistent Grouped NVFP4 quantization kernel

#2743 opened Mar 6, 2026 by Oleg-Goncharov

Loading…

8 of 13 tasks

Add guard at lowest JAX version that still supports triton kernel calling

#2741 opened Mar 6, 2026 by tdophung

Loading…

6 of 13 tasks

[JAX] CGEMM + FP8

#2740 opened Mar 5, 2026 by phu0ngng • Draft

13 tasks

[JAX] GEMM tex and FFI cleanup

#2739 opened Mar 5, 2026 by phu0ngng

Loading…

6 of 13 tasks

[Common] Persistent Grouped MXFP8 quantization kernel enhancement

New feature or request

MoE

#2738 opened Mar 5, 2026 by Oleg-Goncharov

Loading…

9 of 13 tasks

Feat/cp nvshmem enhanced community-contribution

PRs from external contributor outside the core maintainers, representing community-driven work.

#2737 opened Mar 5, 2026 by Knight-of-Thunder

Loading…

13 tasks

[PyTorch debug] Fix issue with tp_group=None

#2733 opened Mar 4, 2026 by pggPL

Loading…

8 of 13 tasks

Feature/unswizzle community-contribution

PRs from external contributor outside the core maintainers, representing community-driven work.

#2732 opened Mar 4, 2026 by int-smart

Loading…

9 of 13 tasks

fix: scope get_full_cu_seqlens cache key by device and inference mode

#2728 opened Mar 3, 2026 by DmCarpe93

Loading…

8 of 13 tasks

[CI] Refactor CI build on GitHub

#2723 opened Mar 2, 2026 by ptrendx • Draft

1 of 13 tasks

[Common, pyTorch] Grouped MXFP8 dequantize support

#2722 opened Mar 2, 2026 by ptrendx • Draft

1 of 13 tasks

Fix for async dcp checkpointing with Float8Tensors

#2721 opened Mar 2, 2026 by pstjohn • Draft

Add MXFP8 attention

#2719 opened Mar 1, 2026 by cyanguwa • Draft

13 tasks

Hongbinl/offload activation cuda graph mxfp8 offload fix

#2716 opened Feb 27, 2026 by lhb8125 • Draft

13 tasks

Add DCP compatibility for FSDP2-TP sharding in TransformerEngine.

#2713 opened Feb 26, 2026 by cspades

Loading…

3 of 13 tasks

Enable dequantization from MXFP8 tensor with only columnwise data

#2712 opened Feb 26, 2026 by ptrendx

Loading…

13 tasks

[Common][PyTorch] Add z_loss_weight and log_sum_exp output to parallel_cross_entropy

#2707 opened Feb 26, 2026 by bassoy • Draft

8 tasks done

[Draft] Newton-Schulz via cuSOLVERMp

#2706 opened Feb 25, 2026 by vcherepanov-nv

Loading…

6 of 13 tasks

[All] Added better error messages

#2705 opened Feb 25, 2026 by ptrendx

Loading…

Fix Flash Attention 3 API compatibility for window size parameters 2.14.0

#2704 opened Feb 25, 2026 by jhvmhg

Loading…

3 of 13 tasks

[Draft][PyTorch] torch.compile support for TE Linear

#2701 opened Feb 24, 2026 by pggPL • Draft

13 tasks

[PyTorch] Zero-initialize learnable softmax_offset in DotProductAttention

#2694 opened Feb 20, 2026 by fjosw

Loading…

7 of 13 tasks

Enable sm120 support for fused attn if cuDNN is 9.18.1+

#2693 opened Feb 20, 2026 by KshitijLakhani • Draft

13 tasks

[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets 2.14.0

#2692 opened Feb 19, 2026 by KshitijLakhani

Loading…

7 of 13 tasks

[PyTorch] Error out if constructing LayerNormLinear with row tensor parallelism bug

Something isn't working

#2688 opened Feb 17, 2026 by timmoon10

Loading…

6 of 13 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!