@@ -758,6 +758,29 @@ entry:
758758
759759Note: Kernel naming is not fully stable for now.
760760
761+ ##### Kernel Fusion Support
762+
763+ The [ experimental kernel fusion
764+ extension] ( ../extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc )
765+ also supports the CUDA backend. However, as neither CUBIN nor PTX are a suitable
766+ input format for the [ kernel fusion JIT compiler] ( KernelFusionJIT.md ) , a
767+ suitable IR has to be added as an additional device binary.
768+
769+ Therefore, in case kernel fusion should be performed for the CUDA backend, the
770+ user needs to specify the additional flag ` -fsycl-embed-ir ` during compilation,
771+ to add LLVM IR as an additional device binary. When the flag ` -fsycl-embed-ir `
772+ is specified, the LLVM IR produced by Clang for the CUDA backend device
773+ compilation is added to the fat binary file. To this end, the resulting
774+ file-table from ` sycl-post-link ` is additionally passed to the
775+ ` clang-offload-wrapper ` , creating a wrapper object with target ` llvm_nvptx64 ` .
776+
777+ This device binary in LLVM IR format can be retrieved by the SYCL runtime and
778+ used by the kernel fusion JIT compiler. The resulting fused kernel is compiled
779+ to PTX assembly by the kernel fusion JIT compiler at runtime.
780+
781+ Note that the device binary in LLVM IR does not replace the device binary in
782+ CUBIN/PTX format, but is embed in addition to it.
783+
761784### Integration with SPIR-V format
762785
763786This section explains how to generate SPIR-V specific types and operations from
0 commit comments