See: https://github.com/TiledTensor/TiledCUDA/pull/154#discussion_r1832035639
See: #154 (comment)