Skip to content

v0.8.0

Latest

Choose a tag to compare

@danieldk danieldk released this 26 Nov 13:59
· 6 commits to main since this release

New features

Support Metal 4 on macOS

kernel-builder builds Metal kernels using Metal 4 support since this release. The minimum required SDK and macOS versions are 26. For more information on how to set up a development environment, see our Metal docs.

Experimental support for Python dependencies

This version adds support for kernel Python dependencies. So far, we mostly considered kernels to be either pure PyTorch + Triton or compiled CUDA/ROCm/XPU with a small Torch wrapper. This assumption made kernels easy to deploy everywhere, since they do not have external dependencies. However, DSLs for writing kernels, such as the CUTLASS DSL, are becoming increasingly popular.

To accommodate such DSLs without bringing back the issues that dependencies have, we allow a small, curated set of dependencies. Currently the only allowed dependencies are einops and nvidia-cutlass-dsl. Dependencies can be added using the new python-depends option of the general section in build.toml:

[general]
name = "my-kernel"
# ...
python-depends = ["nvidia-cutlass-dsl"]

The dependencies are also validated by kernels when a kernel that uses dependencies is downloaded.

build-and-upload

A new build-and-upload command is added that builds and uploads a kernel in one go. If the kernel is not in kernels-community, you can specify the upload location in general.hub:

[general.hub]
repo-id = "my-org/my-kernel"

Flattened build directories

Thus far, kernels were stored in build/<variant>/<module_name>. This version of kernel-builder changes this to build/<variant>. This solves the issue where are kernel cannot be loaded when module_name does not match the repository name (e.g. after a rename). For the next few releases, kernel-builder will put a compatibility module at build/<variant>/<module_name> to make sure that a kernel can be loaded with an older version of kernels.

What's Changed

  • misc(builder): enable detection of ARM64 arch on Windows and turn on correct VS / CMake environments by @mfuntowicz in #272
  • fix(windows): always define _WIN32 preprocessor macro to prevent PyTorch compiling unsupported code by @mfuntowicz in #275
  • bug(windows): fix invalid generated build name by @mfuntowicz in #274
  • feat(windows): allow detecting Python executable by @mfuntowicz in #276
  • Missing Windows knobs to make it compatible with kernels by @mfuntowicz in #277
  • do not use ONEDNN_XPU_INCLUDE_DIR since it's only needed for torch2_7. by @sywangyi in #273
  • Add build-and-upload command by @danieldk in #278
  • Remove examples/activation by @danieldk in #261
  • fix(build2cmake): ignore untracked files when looking for modified files to suffix with _dirty by @mfuntowicz in #280
  • feat(windows): do not include cxx11 ABI flag when generating names by @mfuntowicz in #281
  • Remove duplicate build variant name code by @danieldk in #285
  • Add support for building CPU-only kernels by @danieldk in #284
  • Remove Python bytecode after checks by @danieldk in #286
  • Use correct Python interpreter for metallib_to_header by @danieldk in #288
  • Fix metal kernels support by @MekkCyber in #287
  • Switch to binary Torch wheels by @danieldk in #289
  • Update to macOS SDK 26 and Metal 4 by @danieldk in #290
  • Add doc on the required environment for Metal by @danieldk in #292
  • Also remove bytecode from universal builds by @danieldk in #294
  • Include CPU kernels in CI builds by @danieldk in #296
  • Flatten build variants to build/<variant> by @danieldk in #293
  • Allow dashes in kernel names by @danieldk in #297
  • extensionName -> moduleName by @danieldk in #298
  • feat: support metal cpp by @drbh in #295
  • Add support for (limited) Python dependencies: nvidia-cutlass-dsl and einops by @danieldk in #302
  • Copy over Torch from hf-nix and fix the AArch64 build by @danieldk in #304
  • Remove dependency on hf-nix by @danieldk in #305
  • fix(windows): force USE_CUDA/USE_ROCM definitions to ensure PyTorch guards are not bypassed by @mfuntowicz in #303
  • Fix typos by @omahs in #309
  • Extend cutlass to bmg by @sywangyi in #307
  • Update tracing-subscriber to solve dependabot issue by @danieldk in #310
  • gen-flake-outputs: add backendBundle output by @danieldk in #312
  • Add a Discord link by @danieldk in #313
  • Set version to 0.8.0-dev0 by @danieldk in #315

New Contributors

Full Changelog: v0.7.0...v0.8.0