Skip to content

Conversation

@jargh
Copy link
Contributor

@jargh jargh commented Nov 12, 2025

Issue #, if available:

Description of changes:

This supplements the existing memory operation cosimulation, which has harnesses to handle general relative addresses. We now also cosimulate "simplified" memory operations, which are basically the memory iforms extracted from the code with all relative addresses forced to be [rsp+32]. Collectively, this gives broader coverage of memory operations for additional sanity checking. We now select the kind of simulation with probability 1/3 each

  • Register-to-register operations
  • Memory operations with general relative address harness
  • Simplified memory operations extracted from the code

A side-effect of the new process of generation of the x86 instruction list from the code rather than the proofs is that we get a more comprehensive list of register-to-register instructions than before; for example MULX was formerly completely missed (see commit messages for more details).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jargh added 5 commits November 8, 2025 10:39
This makes it easier to set up cosimulation of memory operations
like VMOVDQA that depend on this level of alignment. This wasn't
necessary before because 16-alignment is already the default on
many platforms.
The previous version extracted the instructions from the formal
versions of the code inside the proof files. This has a few flaws,
notably the fact that sometimes line breaks separate the actual list
from the commented instruction, something that for example meant
that the important MULX instruction was *never* getting cosimulated.

The new version extracts directly from the source code by mangling +
assembly + objdump. This results in a strictly larger list of
instructions (2740 in place of 1342 currently). As well as the basic
"iclasses" list it also sets up a separate list "iclasses_simplemem"
of memory instructions, corresponding to memory instructions in the
code but with all relative addresses forced to be [rsp+32]. This
allows for at least a limited cosimulation of all the basic iforms
that occur in the code, except for control-flow and stack modification
(jcc, jmp, call, ret, push, pop).
This supplements the existing memory operation cosimulation, which
has harnesses to handle general relative addresses. We now also
cosimulate "simplified" memory operations, which are basically
the memory iforms extracted from the code with all relative
addresses forced to be [rsp+32]. Collectively, this gives broader
coverage of memory operations for additional sanity checking. We
now select the kind of simulation with probability 1/3 each
when running the simulator:

 - Register-to-register operations
 - Memory operations with general relative address harness
 - Simplified memory operations extracted from the code
This avoids pollution of the simulator output by linker warnings.
This is a bit over-conservative since it's really only needed for
VMOVDQA at the moment, but it keeps things simple.
Copy link
Collaborator

@aqjune-aws aqjune-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about the ratio 1:1:1 between register-register ops, memory ops and simple memory ops - IIUC there are more register-register ops than memory + simple memory operations. Would the 1:1:1 ratio make simulator miss more register operations..? I thought it is fine to slightly give more odds to register-register ops testing.


for i in [a-oq-z]*/*.S p[235]*/*.S
do
egrep -v '\.quad|\.word' $i | gcc -E -I ../include -xassembler-with-cpp -DWINDOWS_ABI=0 - >/tmp/source_nodata.S
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should -DWINDOWS_ABI=0 be 1? I checked the last /tmp/source_nodata.S (which was p521/p521_jscalarmul_alt.S on my machine) and could confirm that it was not using Windows ABI because it did not have Lp521_jscalarmul_alt_standard.

Copy link
Contributor Author

@jargh jargh Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that in principle the Windows form might use slightly more instructions (if the save/restore prologue does something different). So yes, I intended to write 1 :-)

$OBJDUMP -M intel --no-addresses -d /tmp/memory.o | grep '^\s' | sed -E -e 's/^( |\t)*/ /' | sed -E -e 's/( |\t)( |\t).*//' >/tmp/memory_codings
sed -E -e 's/([0-9a-f][0-9a-f])/0x\1;/g' /tmp/memory_codings | sed -e 's/^ /\[/' | sed -e 's/;$/];/' >/tmp/memory_paste1
grep -v '.intel_syntax' /tmp/memory_instructions | sed -E -e 's/(^.*$)/\(\* \1 \*\)/' >/tmp/memory_paste2
paste -d ' ' /tmp/memory_paste1 /tmp/memory_paste2 >> "$outfile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware of this nice paste command..!

];;

let simplemem_iclasses =
map (fun l -> (fun () -> [l]),true) simple_memory_iclasses;;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for simple_memory_iclasses we don't need to make deferred functions because the stack offset is always fixed to 32. :) In the case of mem_iclasses, having the entries as deferred function was useful because it could make the stack offset randomly assigned for each invocation of cosimulate_instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, there is no value in this indirect appraoch here since the instruction is fixed.

@jargh
Copy link
Contributor Author

jargh commented Nov 13, 2025

Yes, perhaps the 1:1:1 ratio gives too much weight to the memory operations? It would be very reasonable to tweak these values.

let decoded, result = run_random_simplememopsimulation() in
decoded,result,false

let time_limit_sec = 2400.0;;
Copy link
Collaborator

@pennyannn pennyannn Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the function run_random_simulations to report number of simplemem instructions tested and its run time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, that's a good idea

[
(*** these were added manually for extra checks ***)

[0x66; 0x0f; 0x3a; 0x22; 0x7c; 0x24; 0x07; 0x04];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments for which instruction is being tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a somewhat ad-hoc list arising from some instructions I was reviewing for another PR. This should be properly commented.

if add_assum
then `aligned 16 (stackpointer:int64):bool`,`additional_assumptions:bool`
if add_assum = 32
then `aligned 32 stackpointer /\ aligned 16 (stackpointer:int64):bool`,`additional_assumptions:bool`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both aligned 32 stackpointer and aligned 16 stackpointer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not, actually. I was thinking it's better to be safe and add more but since the instruction harnesses will be exercising at most one alignment-relevant instruction, it is probably wasteful.


let run_random_simulation() =
if Random.int 100 < 90 then
let rn = Random.int 99 in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have some question about this fraction. I think once we report the number of instructions tested for this new iclass, then we can decide if it is a reasonable fraction or if we should tune it down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the number of register and simple memory instructions in the generated x86-insns.ml file, it's almost a factor of 10 difference: 2740 register instructions, 275 memory instructions. So that might suggest something like an 8:1:1 balance as fair?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me. We used to have 9:1, and 8:1:1 isn't too far from that.

jargh and others added 5 commits November 14, 2025 11:43
Be explicit that a couple of the new test cases are nonstandard
encodings, i.e. not what assemblers typically produce but still
valid in hardware. Also add one such more register case for MOVQ.
@jargh
Copy link
Contributor Author

jargh commented Nov 18, 2025

I believe the further little commits above address all the issues raised :-)

@jargh jargh merged commit e335aa8 into awslabs:main Nov 19, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants