Skip to content

Implement some of MSVC's intrinsics, for ImportC.#16372

Draft
just-harry wants to merge 84 commits into
dlang:masterfrom
just-harry:msvc-intrinsics
Draft

Implement some of MSVC's intrinsics, for ImportC.#16372
just-harry wants to merge 84 commits into
dlang:masterfrom
just-harry:msvc-intrinsics

Conversation

@just-harry

@just-harry just-harry commented Apr 13, 2024

Copy link
Copy Markdown
Contributor

This PR implements all but three of the MSVC intrinsics listed here: https://web.archive.org/web/20240412171516/https://learn.microsoft.com/en-ie/cpp/intrinsics/alphabetical-listing-of-intrinsic-functions?view=msvc-170, and a handful of undocumented intrinsics.

The three unimplemented intrinsics are _AddressOfReturnAddress, __getcallerseflags, and _ReturnAddress, because they need compiler support for their implementation.

The implemented intrinsics are listed in this expando.
  • __assume
  • __umulh
  • __mulh
  • _umul128
  • _mul128
  • __emul
  • __emulu
  • _div128
  • _udiv128
  • _div64
  • _udiv64
  • __cpuid
  • __cpuidex
  • _cvt_ftoi_fast
  • _cvt_ftoll_fast
  • _cvt_ftoui_fast
  • _cvt_ftoull_fast
  • _cvt_dtoi_fast
  • _cvt_dtoll_fast
  • _cvt_dtoui_fast
  • _cvt_dtoull_fast
  • _cvt_ftoi_sat
  • _cvt_ftoll_sat
  • _cvt_ftoui_sat
  • _cvt_ftoull_sat
  • _cvt_dtoi_sat
  • _cvt_dtoll_sat
  • _cvt_dtoui_sat
  • _cvt_dtoull_sat
  • _cvt_ftoi_sent
  • _cvt_ftoll_sent
  • _cvt_ftoui_sent
  • _cvt_ftoull_sent
  • _cvt_dtoi_sent
  • _cvt_dtoui_sent
  • _cvt_dtoll_sent
  • _cvt_dtoull_sent
  • __readgsbyte
  • __readgsword
  • __readgsdword
  • __readgsqword
  • __writegsbyte
  • __writegsword
  • __writegsdword
  • __writegsqword
  • __addgsbyte
  • __addgsword
  • __addgsdword
  • __addgsqword
  • __incgsbyte
  • __incgsword
  • __incgsdword
  • __incgsqword
  • __readfsbyte
  • __readfsword
  • __readfsdword
  • __readfsqword
  • __writefsbyte
  • __writefsword
  • __writefsdword
  • __writefsqword
  • __addfsbyte
  • __addfsword
  • __addfsdword
  • __incfsbyte
  • __incfsword
  • __incfsdword
  • __debugbreak
  • __fastfail
  • __faststorefence
  • _disable
  • _enable
  • _interlockedadd [Undocumented]
  • _interlockedadd64 [Undocumented]
  • _InterlockedAdd
  • _InterlockedAdd_acq
  • _InterlockedAdd_rel
  • _InterlockedAdd_nf
  • _InterlockedAdd64
  • _InterlockedAdd64_acq
  • _InterlockedAdd64_rel
  • _InterlockedAdd64_nf
  • _InterlockedAddLargeStatistic
  • _InterlockedAnd
  • _InterlockedAnd8
  • _InterlockedAnd16
  • _interlockedand64 [Undocumented]
  • _InterlockedAnd_acq
  • _InterlockedAnd_rel
  • _InterlockedAnd_nf
  • _InterlockedAnd8_acq
  • _InterlockedAnd8_rel
  • _InterlockedAnd8_nf
  • _InterlockedAnd16_acq
  • _InterlockedAnd16_rel
  • _InterlockedAnd16_nf
  • _InterlockedAnd64_acq
  • _InterlockedAnd64_rel
  • _InterlockedAnd64_nf
  • _InterlockedAnd64
  • _InterlockedAnd_np
  • _InterlockedAnd8_np
  • _InterlockedAnd16_np
  • _InterlockedAnd64_np
  • _InterlockedAnd64_HLEAcquire
  • _InterlockedAnd64_HLERelease
  • _InterlockedAnd_HLEAcquire
  • _InterlockedAnd_HLERelease
  • _interlockedbittestandreset
  • _interlockedbittestandreset64
  • _interlockedbittestandreset_HLEAcquire
  • _interlockedbittestandreset_HLERelease
  • _interlockedbittestandreset64_HLEAcquire
  • _interlockedbittestandreset64_HLERelease
  • _interlockedbittestandreset_acq
  • _interlockedbittestandreset_rel
  • _interlockedbittestandreset_nf
  • _interlockedbittestandreset64_acq
  • _interlockedbittestandreset64_rel
  • _interlockedbittestandreset64_nf
  • _interlockedbittestandset
  • _interlockedbittestandset64
  • _interlockedbittestandset_HLEAcquire
  • _interlockedbittestandset_HLERelease
  • _interlockedbittestandset64_HLEAcquire
  • _interlockedbittestandset64_HLERelease
  • _interlockedbittestandset_acq
  • _interlockedbittestandset_rel
  • _interlockedbittestandset_nf
  • _interlockedbittestandset64_acq
  • _interlockedbittestandset64_rel
  • _interlockedbittestandset64_nf
  • _InterlockedCompareExchange
  • _InterlockedCompareExchange8
  • _InterlockedCompareExchange16
  • _InterlockedCompareExchange64
  • _InterlockedCompareExchange_HLEAcquire
  • _InterlockedCompareExchange_HLERelease
  • _InterlockedCompareExchange64_HLEAcquire
  • _InterlockedCompareExchange64_HLERelease
  • _InterlockedCompareExchange_np
  • _InterlockedCompareExchange16_np
  • _InterlockedCompareExchange64_np
  • _InterlockedCompareExchange_acq
  • _InterlockedCompareExchange_rel
  • _InterlockedCompareExchange_nf
  • _InterlockedCompareExchange8_acq
  • _InterlockedCompareExchange8_rel
  • _InterlockedCompareExchange8_nf
  • _InterlockedCompareExchange16_acq
  • _InterlockedCompareExchange16_rel
  • _InterlockedCompareExchange16_nf
  • _InterlockedCompareExchange64_acq
  • _InterlockedCompareExchange64_rel
  • _InterlockedCompareExchange64_nf
  • _InterlockedCompareExchange128
  • _InterlockedCompareExchange128_np
  • _InterlockedCompareExchange128_acq
  • _InterlockedCompareExchange128_rel
  • _InterlockedCompareExchange128_nf
  • _InterlockedCompareExchangePointer
  • _InterlockedCompareExchangePointer_HLEAcquire
  • _InterlockedCompareExchangePointer_HLERelease
  • _InterlockedCompareExchangePointer_np
  • _InterlockedCompareExchangePointer_acq
  • _InterlockedCompareExchangePointer_rel
  • _InterlockedCompareExchangePointer_nf
  • _InterlockedDecrement
  • _InterlockedDecrement16
  • _interlockeddecrement64
  • _InterlockedDecrement64
  • _InterlockedDecrement_acq
  • _InterlockedDecrement_rel
  • _InterlockedDecrement_nf
  • _InterlockedDecrement16_acq
  • _InterlockedDecrement16_rel
  • _InterlockedDecrement16_nf
  • _InterlockedDecrement64_acq
  • _InterlockedDecrement64_rel
  • _InterlockedDecrement64_nf
  • _InterlockedExchange
  • _InterlockedExchange8
  • _InterlockedExchange16
  • _interlockedexchange64
  • _InterlockedExchange64
  • _InterlockedExchange_HLEAcquire
  • _InterlockedExchange_HLERelease
  • _InterlockedExchange64_HLEAcquire
  • _InterlockedExchange64_HLERelease
  • _InterlockedExchange_acq
  • _InterlockedExchange_rel
  • _InterlockedExchange_nf
  • _InterlockedExchange8_acq
  • _InterlockedExchange8_rel
  • _InterlockedExchange8_nf
  • _InterlockedExchange16_acq
  • _InterlockedExchange16_rel
  • _InterlockedExchange16_nf
  • _InterlockedExchange64_acq
  • _InterlockedExchange64_rel
  • _InterlockedExchange64_nf
  • _InterlockedExchangeAdd
  • _InterlockedExchangeAdd8
  • _InterlockedExchangeAdd16
  • _interlockedexchangeadd64 [Undocumented]
  • _InterlockedExchangeAdd64
  • _InterlockedExchangeAdd_HLEAcquire
  • _InterlockedExchangeAdd_HLERelease
  • _InterlockedExchangeAdd64_HLEAcquire
  • _InterlockedExchangeAdd64_HLERelease
  • _InterlockedExchangeAdd_acq
  • _InterlockedExchangeAdd_rel
  • _InterlockedExchangeAdd_nf
  • _InterlockedExchangeAdd8_acq
  • _InterlockedExchangeAdd8_rel
  • _InterlockedExchangeAdd8_nf
  • _InterlockedExchangeAdd16_acq
  • _InterlockedExchangeAdd16_rel
  • _InterlockedExchangeAdd16_nf
  • _InterlockedExchangeAdd64_acq
  • _InterlockedExchangeAdd64_rel
  • _InterlockedExchangeAdd64_nf
  • _InterlockedExchangePointer
  • _InterlockedExchangePointer_HLEAcquire
  • _InterlockedExchangePointer_HLERelease
  • _InterlockedExchangePointer_acq
  • _InterlockedExchangePointer_rel
  • _InterlockedExchangePointer_nf
  • _InterlockedIncrement
  • _InterlockedIncrement16
  • _interlockedincrement64 [Undocumented]
  • _InterlockedIncrement64
  • _InterlockedIncrement_acq
  • _InterlockedIncrement_rel
  • _InterlockedIncrement_nf
  • _InterlockedIncrement16_acq
  • _InterlockedIncrement16_rel
  • _InterlockedIncrement16_nf
  • _InterlockedIncrement64_acq
  • _InterlockedIncrement64_rel
  • _InterlockedIncrement64_nf
  • _InterlockedOr
  • _InterlockedOr8
  • _InterlockedOr16
  • _interlockedor64 [Undocumented]
  • _InterlockedOr_acq
  • _InterlockedOr_rel
  • _InterlockedOr_nf
  • _InterlockedOr8_acq
  • _InterlockedOr8_rel
  • _InterlockedOr8_nf
  • _InterlockedOr16_acq
  • _InterlockedOr16_rel
  • _InterlockedOr16_nf
  • _InterlockedOr64_acq
  • _InterlockedOr64_rel
  • _InterlockedOr64_nf
  • _InterlockedOr64
  • _InterlockedOr_np
  • _InterlockedOr8_np
  • _InterlockedOr16_np
  • _InterlockedOr64_np
  • _InterlockedOr64_HLEAcquire
  • _InterlockedOr64_HLERelease
  • _InterlockedOr_HLEAcquire
  • _InterlockedOr_HLERelease
  • _InterlockedXor
  • _InterlockedXor8
  • _InterlockedXor16
  • _interlockedxor64 [Undocumented]
  • _InterlockedXor_acq
  • _InterlockedXor_rel
  • _InterlockedXor_nf
  • _InterlockedXor8_acq
  • _InterlockedXor8_rel
  • _InterlockedXor8_nf
  • _InterlockedXor16_acq
  • _InterlockedXor16_rel
  • _InterlockedXor16_nf
  • _InterlockedXor64_acq
  • _InterlockedXor64_rel
  • _InterlockedXor64_nf
  • _InterlockedXor64
  • _InterlockedXor_np
  • _InterlockedXor8_np
  • _InterlockedXor16_np
  • _InterlockedXor64_np
  • _InterlockedXor64_HLEAcquire
  • _InterlockedXor64_HLERelease
  • _InterlockedXor_HLEAcquire
  • _InterlockedXor_HLERelease
  • __inbyte
  • __inword
  • __indword
  • __outbyte
  • __outword
  • __outdword
  • __inbytestring
  • __inwordstring
  • __indwordstring
  • __outbytestring
  • __outwordstring
  • __outdwordstring
  • __int2c
  • __invlpg
  • __lidt
  • __ll_lshift
  • __ll_rshift
  • __ull_rshift
  • __lzcnt16
  • __lzcnt
  • __lzcnt64
  • _mm_cvtsi64x_ss
  • _mm_cvtss_si64x
  • _mm_cvttss_si64x
  • _mm_extract_si64
  • _mm_extracti_si64
  • _mm_insert_si64
  • _mm_inserti_si64
  • _mm_stream_sd
  • _mm_stream_ss
  • _mm_stream_si64x
  • __movsb
  • __movsw
  • __movsd
  • __movsq
  • __noop
  • __nop
  • __popcnt16
  • __popcnt
  • __popcnt64
  • __rdtsc
  • __rdtscp
  • __readcr0
  • __readcr2
  • __readcr3
  • __readcr4
  • __readcr8
  • __readdr
  • __readeflags
  • __readmsr
  • __readpmc
  • __segmentlimit
  • __shiftleft128
  • __shiftright128
  • __sidt
  • __stosb
  • __stosw
  • __stosd
  • __stosq
  • __svm_clgi
  • __svm_invlpga
  • __svm_skinit
  • __svm_stgi
  • __svm_vmload
  • __svm_vmrun
  • __svm_vmsave
  • __ud2
  • __vmx_off
  • __vmx_on
  • __vmx_vmclear
  • __vmx_vmlaunch
  • __vmx_vmptrld
  • __vmx_vmptrst
  • __vmx_vmread
  • __vmx_vmresume
  • __vmx_vmwrite
  • __wbinvd
  • __writecr0
  • __writecr2
  • __writecr3
  • __writecr4
  • __writecr8
  • __writedr
  • __writeeflags
  • __writemsr
  • _ReadBarrier
  • _WriteBarrier
  • _ReadWriteBarrier
  • _BitScanForward
  • _BitScanReverse
  • _BitScanForward64
  • _BitScanReverse64
  • _bittest
  • _bittestandcomplement
  • _bittestandreset
  • _bittestandset
  • _bittest64
  • _bittestandcomplement64
  • _bittestandreset64
  • _bittestandset64
  • _byteswap_uint64
  • _byteswap_ulong
  • _byteswap_ushort
  • _lrotr
  • _lrotl
  • _rotr
  • _rotl
  • _rotr64
  • _rotl64
  • _rotr16
  • _rotl16
  • _rotr8
  • _rotl8

These implementations aim to be as compatible with the MSVC intrinsics as is possible—adhering to Hyrum's Law.

Separate implementations are provided for DMD, LDC, and GDC, and for x86, x86-64, AArch64, and ARM.
Almost all the intrinsics are implemented in D, except for __assume which is a C macro.

Every intrinsic, where possible, has a CTFE-compatible code-path.
It all compiles with or without DIP1000 being enabled.
Care has been taken to ensure that none of the implementations rely on DRuntime, so that these work in BetterC.


Regarding the _cvt_ family of functions: by default MSVC will generate code that uses SSE2 instructions, even for 32-bit targets, which means that for 32-bit targets the _cvt_ functions will use SSE2.
This is contrary to DMD's usual behaviour of using x87 for 32-bit Windows.
For their reimplementations, I've used SSE2 anyway for 32-bit targets for DMD, as doing otherwise would constitute a change in behaviour, as x87 FP-exceptions are different from SIMD FP-exceptions (and, I think the SSE2 and x87 versions return different results).
The oldest of the _cvt_ functions was introduced in the May of 2021, so I think it's almost certain that any code using them will be targeting at-least SSE2 anyway.

Additionally, I've written a program that tests that the _cvt_ implementations return identical results to the MSVC implementations for all float values, and for ~402,653,184 double values (except for _cvt_ftoi_fast and _cvt_ftoi_sent on 32-bit targets, as they cause an internal compiler error in MSVC); it also tests that ctfeX86RoundLongToFloat and ctfeX86RoundFloatToLong produce the same results as the hardware.
It relies on Phobos and MSVC, so I don't really know what to do with it other than link to it here: https://github.com/just-harry/float-fuzzing-for-msvc-intrinsics


A few of the intrinsics can be used only in kernel-mode, so unittests have been omitted for them, as I don't think we have any infrastructure in place for testing in kernel-mode.

Some of the intrinsics terminate the program, so their unittests have been wrapped in a version (none), others rely on specific compiler optimisations, so they too have been wrapped in a version (none).


I've split the intrinsics up into a few dozen commits to try and alleviate the whole 13,000-lines-of-codes-all-at-once thing.
(I staged them after-the-fact, so a few braces may be out-of-place in the intermediate commits.)


The intrinsics have been placed in their own files, separate from the existing ImportC builtin files, to try and avoid crowding the builtin files.

Currently, the ImportC builtins are imported conditionally, based on some heuristics.
One of those heuristics is if any identifier beginning with two underscores is used – I've changed that one to instead trigger on a single leading underscore as many of MSVC's intrinsics begin with only one underscore.


One notable header that can be successfully included by ImportC, with these intrinsics implemented, is windows.h.


P.S. If this is outside the purview of ImportC: that's fine, I'll just publish this as a library instead.

Comment thread changelog/druntime.importc-initial-msvc-intrinsics.dd Outdated
Comment thread druntime/src/__importc_msvc_builtins.d
*remainder = cast(int) (dividend % divisor);
return cast(int) (dividend / divisor);
}
else

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to provide an asm implementation here? ditto throughout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For _div64 specifically, yes, as compilers will generate the 64-bit encoding of div, which won't cause a #DE error on overflow of the 32-bit quotient, so a change of behaviour.

For everything else, generally yeah, either because it can't be represented without asm, or to ensure that the generated code behaves identically to MSVC's code, or for performance where it's relevant (like 128-bit multiplication).

{
if (__ctfe)
{
/* This is an amalgamation of core.int128.divmod and core.int128.neg. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply forward to that implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would rely on linking with DRuntime, which would rule out BetterC compatibility.

@just-harry just-harry marked this pull request as ready for review April 13, 2024 02:18
@just-harry just-harry requested a review from ibuclaw as a code owner April 13, 2024 02:18
Comment thread compiler/src/dmd/cparse.d Outdated
Comment on lines +128 to +129
auto s = new AST.Import(Loc.initial, null, Id.importc_builtins, null, false);
wrap.push(s);
wrap.push(new AST.Import(Loc.initial, null, Id.importc_builtins, null, false));
wrap.push(new AST.Import(Loc.initial, null, Id.importc_msvc_builtins, null, false));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could mscv builtins be publicly imported by importc builtins instead? That would remove the need to alter the compiler.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, they can be; good shout. (Though, the __importc_msvc_builtins.d file had to be renamed to __builtins_msvc.d, to match the module name.)

@just-harry just-harry marked this pull request as draft April 15, 2024 18:30
@just-harry

Copy link
Copy Markdown
Contributor Author

I've flipped this back to a draft PR as currently the functions in the __builtins_msvc module can be used only when the -i switch is supplied on the command-line.
I think it might be that semantic3 isn't getting run on that module's functions when -i isn't supplied—but I'm unsure.

@@ -0,0 +1,4 @@
ImportC, when targeting the Microsoft C runtime, supports a subset of the intrinsics recognised by the MSVC compiler.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in the opening comment in the PR is far superior to this readme. Merge it in here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've copied the right part—better now?

Comment thread druntime/src/importc_msvc_builtins_c.h Outdated

static inline unsigned char __readgsbyte(unsigned int Offset)
{
return *(unsigned char __seg_gs *) (_import_c_msvc_ptr_int) Offset;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImportC does not recognize __seg_gs or __seg_fs, so I don't see how this can work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that GDC delegated to GCC for C files, but I was mistaken. I've replaced this with inline asm in D, as is already done for DMD.

Comment thread compiler/src/dmd/cparse.d Outdated
if (idx.length > 2 && idx[0] == '_' && idx[1] == '_') // leading double underscore
importBuiltins = true; // probably one of those compiler extensions
if (idx.length > 1 && idx[0] == '_') // leading underscore
importBuiltins = true; // maybe one of those compiler extensions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about nearly always importing 10,000 lines of code for Windows compilations.

@just-harry just-harry Apr 17, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, it's not great. How about what I've pushed now?
We have a lazily-initialised StringTable of all the MSVC intrinsic names that we have implemented.
If we're targeting the Microsoft C runtime, and we come across an identifier with a leading underscore, then we check if that identifier is a known MSVC intrinsic, and if it is we import the MSVC builtins and we stop checking future identifiers.

That way, the cost of the import is paid only when one of the intrinsics is actually going to be used.

The downside there is we have to remember to update that StringTable if more MSVC intrinsics are implemented, and we do pay the minor cost of hashing and looking up identifiers with leading underscores.

@just-harry

Copy link
Copy Markdown
Contributor Author

This is still a draft as I'm unable to get the compiler to actually generate code for the __builtins_msvc module without its path being supplied on the command-line.


Presently, neither the __builtins module nor the __builtins_msvc module make their way into the array of modules passed into generateCodeAndWrite, which is fine for __builtins, but not for __builtins_msvc because of how the standard Windows headers use the builtins, and because DMD doesn't inline some of the asm-using functions.

This is effectively how winnt.h declares UnsignedMultiply128:

unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);

#define UnsignedMultiply128 _umul128

unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);

which will expand to unsigned long long _umul128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);, causing the linker to expect code for _umul128.

But even if the Windows headers didn't do that, we still need code to be generated for _umul128, because DMD won't inline _umul128's call to multiplyWithDoubleWidthProduct and the linker will expect code for it.

(_umul128 isn't the only function affected.)


Right now I'm stuck, because I've yet to figure out a good way to get the __builtins_msvc module into the modules array passed into generateCodeAndWrite after it's conditionally imported if any MSVC intrinsics are detected.

The best I've come up with thus far is very hacky, and would require the same changes to be made in LDC and GDC (unless I'm mistaken), which is: in Module.parseModule, if the module identifier is __builtins_msvc, set the global includeImports to true, and if it was previously false set includeByDefault to false, then add __builtins_msvc to the compiledImports array.

In short, my question is: is there an existing way to make it so the __builtins_msvc module winds up getting passed to generateCodeAndWrite after it gets imported by CParser.parseModule?
And, if not, is there an acceptably non-hacky way to thread it through the compiler without requiring changes to LDC and GDC?

Any suggestions would be appreciated.

@WalterBright

Copy link
Copy Markdown
Member

I suggest having the user do the #include line as the first line in his C code. Or have the user pass it in with the dmd switch that passes switches to the C preprocessor. Will that work?

@just-harry

just-harry commented Apr 23, 2024

Copy link
Copy Markdown
Contributor Author

@WalterBright,

I suggest having the user do the #include line as the first line in his C code. Or have the user pass it in with the dmd switch that passes switches to the C preprocessor. Will that work?

If that works for you, it works for me.
I don't see an issue with:

#include <importc_msvc_builtins.h>
#include <windows.h>

I've added some code to supply <path>/importc.h/../include to the preprocessor as an include path, so that the user doesn't have to hunt down the path themselves. (de63140)


Though, the issue of these builtins only working when the -i switch is used still remains.

Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

@just-harry

Copy link
Copy Markdown
Contributor Author

Okay; a separate PR adding a mechanism that this PR can use to force codegen for the __builtins_msvc module: #16443

Via something like this in cparse.d:

         if (token.value == TOK._import) // import declaration extension
         {
             auto a = parseImport();
             if (a && a.length)
/*+*/        {
/*+*/            auto imp = (*a)[0].isImport();
/*+*/            imp.forceCodegen = imp.id == Id.builtins_msvc;
                 symbols.append(a);
/*+*/        }
             return;
         }

Comment on lines +20 to +22
#elif defined(__GNUC__)
#define __assume(expression) do {if (!(expression)) {__builtin_unreachable();}} while (0)
#else

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is meant to be public, FYI I think importC would discard this macro, whereas the other definitions (clang) are simple enough to be exposed as templates.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also __assume is supposed to be an optimiser hint, not an assert, which is what this implementation does.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unreachable is a compile-time barrier, not an assert)

I'll take your word for the clang and !clang!gnu paths.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unreachable is a compile-time barrier, not an assert)

I know that. assert is (often) defined as do { if (!cond) { /* print stuff*/ ; __builtin_unreachable();}} while (0). This is almost identical (modulo the print stuff).

Assume should be #define assume(x) x, i.e. a no-op.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm two-years late in responding to this, but __assume should not be a no-op when mimicking the behaviour of MSVC.

MSVC makes __assume assume two roles: one role is as an optimisation hint; the other role—the more important role—is to denote unreachable code when the expression is statically-known to be false or zero.

To demonstrate, here is a Compiler Explorer session that shows __assume(0) preventing MSVC/clang/gcc from emitting a warning due to a lack of a return-value in a non-void function.

@ibuclaw

ibuclaw commented May 5, 2024

Copy link
Copy Markdown
Member

Though, the issue of these builtins only working when the -i switch is used still remains.

Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

Did you try converting these functions to templates?

private void llvm_arm_hint(int) @safe pure nothrow @nogc;
}
}
else version (GNU)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to consider both ldc and gdc here, it's really appreciated

I was just wondering - GCC doesn't support MSVC. So are these GNU version blocks dead on Mingw targets?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to consider both ldc and gdc here, it's really appreciated

:)

So are these GNU version blocks dead on Mingw targets?

I think so. Unless GDC defines CRuntime_Microsoft when targeting MinGW?
I don't really know the story as far as GDC's support of Windows goes. So, I just implemented the GDC versions to save some other poor sod the hassle of doing it later it on.

I don't think there's anything preventing this implementation from working when targeting MinGW or Cygwin, if that's wanted?

@just-harry

Copy link
Copy Markdown
Contributor Author

Though, the issue of these builtins only working when the -i switch is used still remains.
Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

Did you try converting these functions to templates?

I did—but no dice, unfortunately, even with all the affected functions being templates, the simplest possible usage still fails to link:

// windows_h_c.c:
#include <importc_msvc_builtins.h>
#include <windows.h>

// windows_h.d:
pragma(lib, "MinCore");
import windows_h_c;
windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd referenced in function _InlineInterlockedAdd
windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd64 referenced in function _InlineInterlockedAdd64
windows_h.obj : error LNK2019: unresolved external symbol _mul128 referenced in function MultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol __shiftright128 referenced in function MultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol _umul128 referenced in function UnsignedMultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol _ReadWriteBarrier referenced in function BarrierAfterRead
windows_h.obj : error LNK2019: unresolved external symbol __stosb referenced in function RtlSecureZeroMemory
windows_h.obj : error LNK2019: unresolved external symbol __readgsqword referenced in function NtCurrentTeb
windows_h.exe : fatal error LNK1120: 8 unresolved externals

@WalterBright

Copy link
Copy Markdown
Member

Just have the user add the #include line.

@just-harry

Copy link
Copy Markdown
Contributor Author

Just have the user add the #include line.

Perhaps I'm misunderstanding you, but is that not what the current implementation is doing?
In the example here: #16372 (comment)

There, the user #includes <importc_msvc_builtins.h> before <windows.h>, which works perfectly well.
importc_msvc_builtins.h then does an __import __builtins_msvc;
The import of __builtins_msvc succeeds because druntime/import is in the compiler's import path.
But because the path to __builtins_msvc.d wasn't explicitly provided on the command-line, codegen doesn't take place for the __builtins_msvc module.
Which ultimately causes the linker to fail because the way the standard Windows headers uses some of these intrinsics causes the linker to expect a definition for them.

@just-harry

Copy link
Copy Markdown
Contributor Author
While revisiting this PR in preparation for #16464 I found a few more intrinsics defined via MSVC's `#pragma intrinsic` directive, and so I implemented the ones that I could; those are listed in this expando:
  • __addx18byte
  • __addx18dword
  • __addx18qword
  • __addx18word
  • __incx18byte
  • __incx18dword
  • __incx18qword
  • __incx18word
  • __readx18byte
  • __readx18dword
  • __readx18qword
  • __readx18word
  • __writex18byte
  • __writex18dword
  • __writex18qword
  • __writex18word
  • __iso_volatile_load16
  • __iso_volatile_load32
  • __iso_volatile_load64
  • __iso_volatile_load8
  • __iso_volatile_store16
  • __iso_volatile_store32
  • __iso_volatile_store64
  • __iso_volatile_store8
  • __load_acquire16
  • __load_acquire32
  • __load_acquire64
  • __load_acquire8
  • __ldar16
  • __ldar32
  • __ldar64
  • __ldar8
  • __stlr16
  • __stlr32
  • __stlr64
  • __stlr8
  • __prefetch
  • __prefetchw
  • _m_prefetchw
  • _mm_prefetch
  • _mm_clflush
  • _mm_getcsr
  • _mm_setcsr
  • _mm_lfence
  • _mm_mfence
  • _mm_sfence

There are a number of AArch64/ARM32 intrinsics which remain unimplemented for one of two reasons.
The first being that they require compiler support, those being: [AArch64: __getCallerReg; __getCallerRegFp; __setCallerReg; __setCallerRegFp].
The second being that ImportC is unable to supply compile-time arguments to __imported D function templates: [AArch64: __break; __prefetch2; __getReg; __getRegFp; __setReg; __setRegFp]; [ARM32: __emit; _MoveFromCoprocessor; _MoveToCoprocessor]; [AArch64/ARM32: _ReadStatusReg; _WriteStatusReg].

(As an aside, it would be really nice if ImportC had something like __call_d(functionName)(templateArgs...)(runtimeArgs...) which would call an __imported D function with the usual IFTI rules, and something like __instantiate(templateName)(templateArgs...) for types.)

Additionally, I stripped out all usage of the shared type-qualifier for the various interlocked/atomic MSVC intrinsics because it didn't really work for C callers.

A small handful of unittests, which rely on DRuntime, have been disabled when compiling under D_BetterC.

Lastly, the changelog entry has been amended so that it shows how to use MSVC intrinsics in ImportC, and to point out that they enable #include <windows.h> to just work™.

@just-harry

Copy link
Copy Markdown
Contributor Author

To summarise the PR body and the last thirty-odd comments:

This PR implements a ton of the MSVC compiler's intrinsic functions.
This is desirable because it greatly increases the likelihood of ImportC being able to successfully import and use C libraries that target Windows, especially ones focussed on achieving high performance (i.e. the ones that are worth using :)

Notably, this PR makes it possible for ImportC to successfully include the headers of the Windows SDK—which is unspeakably useful for Windows application and driver development.
This addresses a common stumbling-block with ImportC: 1, 2, 3, 4, 5.

MSVC intrinsics can be used in ImportC by including the importc_msvc_builtins.h header.
The reason why these intrinsics aren't available automatically (like the intrinsics in DRuntime's __builtins.di file) is to avoid the cost of unconditionally importing over 10,000 lines of code for most every Windows compilation (when using ImportC).

However, the Windows SDK's headers implement MSVC's intrinsics by first declaring them as external body-less functions, and then applying the #pragma intrinsic directive to undo that.
As ImportC does not understand #pragma intrinsic, when the Windows SDK's headers are included the linker ends up expecting external definitions for the MSVC intrinsics: which means that codegen has to take place unconditionally for all 451 intrinsics in the __builtins_msvc module, and to make matters worse, the intrinsics can't be inlined.

As such, this PR is currently blocked. (Codegen cannot take place unconditionally unless the -i switch is passed to the compiler, or if the path of __builtins_msvc.d is supplied to the compiler, both of which are unreasonable requirements.)

If #16464 is merged, #pragma function_decl(ignore, ...) can be used in importc_msvc_builtins.h to the same effect as #pragma intrinsic, thus side-stepping our codegen woes and allowing MSVC intrinsics to be inlined.

The full list of implemented intrinsics can be found in this expando:
  • __assume
  • __umulh
  • __mulh
  • _umul128
  • _mul128
  • __emul
  • __emulu
  • _div128
  • _udiv128
  • _div64
  • _udiv64
  • __cpuid
  • __cpuidex
  • _cvt_ftoi_fast
  • _cvt_ftoll_fast
  • _cvt_ftoui_fast
  • _cvt_ftoull_fast
  • _cvt_dtoi_fast
  • _cvt_dtoll_fast
  • _cvt_dtoui_fast
  • _cvt_dtoull_fast
  • _cvt_ftoi_sat
  • _cvt_ftoll_sat
  • _cvt_ftoui_sat
  • _cvt_ftoull_sat
  • _cvt_dtoi_sat
  • _cvt_dtoll_sat
  • _cvt_dtoui_sat
  • _cvt_dtoull_sat
  • _cvt_ftoi_sent
  • _cvt_ftoll_sent
  • _cvt_ftoui_sent
  • _cvt_ftoull_sent
  • _cvt_dtoi_sent
  • _cvt_dtoui_sent
  • _cvt_dtoll_sent
  • _cvt_dtoull_sent
  • __readgsbyte
  • __readgsword
  • __readgsdword
  • __readgsqword
  • __writegsbyte
  • __writegsword
  • __writegsdword
  • __writegsqword
  • __addgsbyte
  • __addgsword
  • __addgsdword
  • __addgsqword
  • __incgsbyte
  • __incgsword
  • __incgsdword
  • __incgsqword
  • __readfsbyte
  • __readfsword
  • __readfsdword
  • __readfsqword
  • __writefsbyte
  • __writefsword
  • __writefsdword
  • __writefsqword
  • __addfsbyte
  • __addfsword
  • __addfsdword
  • __incfsbyte
  • __incfsword
  • __incfsdword
  • __readx18byte
  • __readx18dword
  • __readx18qword
  • __readx18word
  • __writex18byte
  • __writex18dword
  • __writex18qword
  • __writex18word
  • __addx18byte
  • __addx18dword
  • __addx18qword
  • __addx18word
  • __incx18byte
  • __incx18dword
  • __incx18qword
  • __incx18word
  • __debugbreak
  • __fastfail
  • __faststorefence
  • _disable
  • _enable
  • _interlockedadd [Undocumented]
  • _interlockedadd64 [Undocumented]
  • _InterlockedAdd
  • _InterlockedAdd_acq
  • _InterlockedAdd_rel
  • _InterlockedAdd_nf
  • _InterlockedAdd64
  • _InterlockedAdd64_acq
  • _InterlockedAdd64_rel
  • _InterlockedAdd64_nf
  • _InterlockedAddLargeStatistic
  • _InterlockedAnd
  • _InterlockedAnd8
  • _InterlockedAnd16
  • _interlockedand64 [Undocumented]
  • _InterlockedAnd_acq
  • _InterlockedAnd_rel
  • _InterlockedAnd_nf
  • _InterlockedAnd8_acq
  • _InterlockedAnd8_rel
  • _InterlockedAnd8_nf
  • _InterlockedAnd16_acq
  • _InterlockedAnd16_rel
  • _InterlockedAnd16_nf
  • _InterlockedAnd64_acq
  • _InterlockedAnd64_rel
  • _InterlockedAnd64_nf
  • _InterlockedAnd64
  • _InterlockedAnd_np
  • _InterlockedAnd8_np
  • _InterlockedAnd16_np
  • _InterlockedAnd64_np
  • _InterlockedAnd64_HLEAcquire
  • _InterlockedAnd64_HLERelease
  • _InterlockedAnd_HLEAcquire
  • _InterlockedAnd_HLERelease
  • _interlockedbittestandreset
  • _interlockedbittestandreset64
  • _interlockedbittestandreset_HLEAcquire
  • _interlockedbittestandreset_HLERelease
  • _interlockedbittestandreset64_HLEAcquire
  • _interlockedbittestandreset64_HLERelease
  • _interlockedbittestandreset_acq
  • _interlockedbittestandreset_rel
  • _interlockedbittestandreset_nf
  • _interlockedbittestandreset64_acq
  • _interlockedbittestandreset64_rel
  • _interlockedbittestandreset64_nf
  • _interlockedbittestandset
  • _interlockedbittestandset64
  • _interlockedbittestandset_HLEAcquire
  • _interlockedbittestandset_HLERelease
  • _interlockedbittestandset64_HLEAcquire
  • _interlockedbittestandset64_HLERelease
  • _interlockedbittestandset_acq
  • _interlockedbittestandset_rel
  • _interlockedbittestandset_nf
  • _interlockedbittestandset64_acq
  • _interlockedbittestandset64_rel
  • _interlockedbittestandset64_nf
  • _InterlockedCompareExchange
  • _InterlockedCompareExchange8
  • _InterlockedCompareExchange16
  • _InterlockedCompareExchange64
  • _InterlockedCompareExchange_HLEAcquire
  • _InterlockedCompareExchange_HLERelease
  • _InterlockedCompareExchange64_HLEAcquire
  • _InterlockedCompareExchange64_HLERelease
  • _InterlockedCompareExchange_np
  • _InterlockedCompareExchange16_np
  • _InterlockedCompareExchange64_np
  • _InterlockedCompareExchange_acq
  • _InterlockedCompareExchange_rel
  • _InterlockedCompareExchange_nf
  • _InterlockedCompareExchange8_acq
  • _InterlockedCompareExchange8_rel
  • _InterlockedCompareExchange8_nf
  • _InterlockedCompareExchange16_acq
  • _InterlockedCompareExchange16_rel
  • _InterlockedCompareExchange16_nf
  • _InterlockedCompareExchange64_acq
  • _InterlockedCompareExchange64_rel
  • _InterlockedCompareExchange64_nf
  • _InterlockedCompareExchange128
  • _InterlockedCompareExchange128_np
  • _InterlockedCompareExchange128_acq
  • _InterlockedCompareExchange128_rel
  • _InterlockedCompareExchange128_nf
  • _InterlockedCompareExchangePointer
  • _InterlockedCompareExchangePointer_HLEAcquire
  • _InterlockedCompareExchangePointer_HLERelease
  • _InterlockedCompareExchangePointer_np
  • _InterlockedCompareExchangePointer_acq
  • _InterlockedCompareExchangePointer_rel
  • _InterlockedCompareExchangePointer_nf
  • _InterlockedDecrement
  • _InterlockedDecrement16
  • _interlockeddecrement64
  • _InterlockedDecrement64
  • _InterlockedDecrement_acq
  • _InterlockedDecrement_rel
  • _InterlockedDecrement_nf
  • _InterlockedDecrement16_acq
  • _InterlockedDecrement16_rel
  • _InterlockedDecrement16_nf
  • _InterlockedDecrement64_acq
  • _InterlockedDecrement64_rel
  • _InterlockedDecrement64_nf
  • _InterlockedExchange
  • _InterlockedExchange8
  • _InterlockedExchange16
  • _interlockedexchange64
  • _InterlockedExchange64
  • _InterlockedExchange_HLEAcquire
  • _InterlockedExchange_HLERelease
  • _InterlockedExchange64_HLEAcquire
  • _InterlockedExchange64_HLERelease
  • _InterlockedExchange_acq
  • _InterlockedExchange_rel
  • _InterlockedExchange_nf
  • _InterlockedExchange8_acq
  • _InterlockedExchange8_rel
  • _InterlockedExchange8_nf
  • _InterlockedExchange16_acq
  • _InterlockedExchange16_rel
  • _InterlockedExchange16_nf
  • _InterlockedExchange64_acq
  • _InterlockedExchange64_rel
  • _InterlockedExchange64_nf
  • _InterlockedExchangeAdd
  • _InterlockedExchangeAdd8
  • _InterlockedExchangeAdd16
  • _interlockedexchangeadd64 [Undocumented]
  • _InterlockedExchangeAdd64
  • _InterlockedExchangeAdd_HLEAcquire
  • _InterlockedExchangeAdd_HLERelease
  • _InterlockedExchangeAdd64_HLEAcquire
  • _InterlockedExchangeAdd64_HLERelease
  • _InterlockedExchangeAdd_acq
  • _InterlockedExchangeAdd_rel
  • _InterlockedExchangeAdd_nf
  • _InterlockedExchangeAdd8_acq
  • _InterlockedExchangeAdd8_rel
  • _InterlockedExchangeAdd8_nf
  • _InterlockedExchangeAdd16_acq
  • _InterlockedExchangeAdd16_rel
  • _InterlockedExchangeAdd16_nf
  • _InterlockedExchangeAdd64_acq
  • _InterlockedExchangeAdd64_rel
  • _InterlockedExchangeAdd64_nf
  • _InterlockedExchangePointer
  • _InterlockedExchangePointer_HLEAcquire
  • _InterlockedExchangePointer_HLERelease
  • _InterlockedExchangePointer_acq
  • _InterlockedExchangePointer_rel
  • _InterlockedExchangePointer_nf
  • _InterlockedIncrement
  • _InterlockedIncrement16
  • _interlockedincrement64 [Undocumented]
  • _InterlockedIncrement64
  • _InterlockedIncrement_acq
  • _InterlockedIncrement_rel
  • _InterlockedIncrement_nf
  • _InterlockedIncrement16_acq
  • _InterlockedIncrement16_rel
  • _InterlockedIncrement16_nf
  • _InterlockedIncrement64_acq
  • _InterlockedIncrement64_rel
  • _InterlockedIncrement64_nf
  • _InterlockedOr
  • _InterlockedOr8
  • _InterlockedOr16
  • _interlockedor64 [Undocumented]
  • _InterlockedOr_acq
  • _InterlockedOr_rel
  • _InterlockedOr_nf
  • _InterlockedOr8_acq
  • _InterlockedOr8_rel
  • _InterlockedOr8_nf
  • _InterlockedOr16_acq
  • _InterlockedOr16_rel
  • _InterlockedOr16_nf
  • _InterlockedOr64_acq
  • _InterlockedOr64_rel
  • _InterlockedOr64_nf
  • _InterlockedOr64
  • _InterlockedOr_np
  • _InterlockedOr8_np
  • _InterlockedOr16_np
  • _InterlockedOr64_np
  • _InterlockedOr64_HLEAcquire
  • _InterlockedOr64_HLERelease
  • _InterlockedOr_HLEAcquire
  • _InterlockedOr_HLERelease
  • _InterlockedXor
  • _InterlockedXor8
  • _InterlockedXor16
  • _interlockedxor64 [Undocumented]
  • _InterlockedXor_acq
  • _InterlockedXor_rel
  • _InterlockedXor_nf
  • _InterlockedXor8_acq
  • _InterlockedXor8_rel
  • _InterlockedXor8_nf
  • _InterlockedXor16_acq
  • _InterlockedXor16_rel
  • _InterlockedXor16_nf
  • _InterlockedXor64_acq
  • _InterlockedXor64_rel
  • _InterlockedXor64_nf
  • _InterlockedXor64
  • _InterlockedXor_np
  • _InterlockedXor8_np
  • _InterlockedXor16_np
  • _InterlockedXor64_np
  • _InterlockedXor64_HLEAcquire
  • _InterlockedXor64_HLERelease
  • _InterlockedXor_HLEAcquire
  • _InterlockedXor_HLERelease
  • __inbyte
  • __inword
  • __indword
  • __outbyte
  • __outword
  • __outdword
  • __inbytestring
  • __inwordstring
  • __indwordstring
  • __outbytestring
  • __outwordstring
  • __outdwordstring
  • __int2c
  • __invlpg
  • __lidt
  • __ll_lshift
  • __ll_rshift
  • __ull_rshift
  • __lzcnt16
  • __lzcnt
  • __lzcnt64
  • _mm_cvtsi64x_ss
  • _mm_cvtss_si64x
  • _mm_cvttss_si64x
  • _mm_extract_si64
  • _mm_extracti_si64
  • _mm_insert_si64
  • _mm_inserti_si64
  • _mm_pause
  • _mm_stream_sd
  • _mm_stream_ss
  • _mm_stream_si64x
  • __movsb
  • __movsw
  • __movsd
  • __movsq
  • __noop
  • __nop
  • __popcnt16
  • __popcnt
  • __popcnt64
  • __rdtsc
  • __rdtscp
  • __readcr0
  • __readcr2
  • __readcr3
  • __readcr4
  • __readcr8
  • __readdr
  • __readeflags
  • __readmsr
  • __readpmc
  • __segmentlimit
  • __shiftleft128
  • __shiftright128
  • __sidt
  • __stosb
  • __stosw
  • __stosd
  • __stosq
  • __svm_clgi
  • __svm_invlpga
  • __svm_skinit
  • __svm_stgi
  • __svm_vmload
  • __svm_vmrun
  • __svm_vmsave
  • __ud2
  • __vmx_off
  • __vmx_on
  • __vmx_vmclear
  • __vmx_vmlaunch
  • __vmx_vmptrld
  • __vmx_vmptrst
  • __vmx_vmread
  • __vmx_vmresume
  • __vmx_vmwrite
  • __wbinvd
  • __writecr0
  • __writecr2
  • __writecr3
  • __writecr4
  • __writecr8
  • __writedr
  • __writeeflags
  • __writemsr
  • _ReadBarrier
  • _WriteBarrier
  • _ReadWriteBarrier
  • _BitScanForward
  • _BitScanReverse
  • _BitScanForward64
  • _BitScanReverse64
  • _bittest
  • _bittestandcomplement
  • _bittestandreset
  • _bittestandset
  • _bittest64
  • _bittestandcomplement64
  • _bittestandreset64
  • _bittestandset64
  • _byteswap_uint64
  • _byteswap_ulong
  • _byteswap_ushort
  • _lrotr
  • _lrotl
  • _rotr
  • _rotl
  • _rotr64
  • _rotl64
  • _rotr16
  • _rotl16
  • _rotr8
  • _rotl8
  • __yield
  • __prefetch
  • _m_prefetchw
  • _mm_prefetch
  • _mm_clflush
  • _mm_getcsr
  • _mm_setcsr
  • _mm_lfence
  • _mm_mfence
  • _mm_sfence
  • __iso_volatile_load8
  • __iso_volatile_load16
  • __iso_volatile_load32
  • __iso_volatile_load64
  • __iso_volatile_store8
  • __iso_volatile_store16
  • __iso_volatile_store64
  • __iso_volatile_store32
  • __ldar8
  • __ldar16
  • __ldar32
  • __ldar64
  • __stlr8
  • __stlr16
  • __stlr32
  • __stlr64
  • __load_acquire8
  • __load_acquire16
  • __load_acquire32
  • __load_acquire64

@just-harry

Copy link
Copy Markdown
Contributor Author

And one last thing.
I understand that by contributing these changes to DMD I assign the copyright for the code to the D Language Foundation, who then license the code under the Boost Software Licence.

I would like to request that the code I authored, in the druntime/src/__builtins_msvc.d file, be additionally/alternatively licensed under the Zero Clause BSD Licence, please.
That is to say, that code would be available for use under either the Boost Software Licence OR the Zero Clause BSD Licence.

The Boost Software Licence has only one condition, which is that copies of the source code must retain the copyright notice.
The Zero Clause BSD Licence has no conditions, so it imposes no additional restrictions.

Would this be acceptable, @WalterBright?
Thank you, in advance, for your consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants