-
Notifications
You must be signed in to change notification settings - Fork 38
Cosimulate simplified versions of x86 memory operations in the code #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This makes it easier to set up cosimulation of memory operations like VMOVDQA that depend on this level of alignment. This wasn't necessary before because 16-alignment is already the default on many platforms.
The previous version extracted the instructions from the formal versions of the code inside the proof files. This has a few flaws, notably the fact that sometimes line breaks separate the actual list from the commented instruction, something that for example meant that the important MULX instruction was *never* getting cosimulated. The new version extracts directly from the source code by mangling + assembly + objdump. This results in a strictly larger list of instructions (2740 in place of 1342 currently). As well as the basic "iclasses" list it also sets up a separate list "iclasses_simplemem" of memory instructions, corresponding to memory instructions in the code but with all relative addresses forced to be [rsp+32]. This allows for at least a limited cosimulation of all the basic iforms that occur in the code, except for control-flow and stack modification (jcc, jmp, call, ret, push, pop).
This supplements the existing memory operation cosimulation, which has harnesses to handle general relative addresses. We now also cosimulate "simplified" memory operations, which are basically the memory iforms extracted from the code with all relative addresses forced to be [rsp+32]. Collectively, this gives broader coverage of memory operations for additional sanity checking. We now select the kind of simulation with probability 1/3 each when running the simulator: - Register-to-register operations - Memory operations with general relative address harness - Simplified memory operations extracted from the code
This avoids pollution of the simulator output by linker warnings.
This is a bit over-conservative since it's really only needed for VMOVDQA at the moment, but it keeps things simple.
aqjune-aws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the ratio 1:1:1 between register-register ops, memory ops and simple memory ops - IIUC there are more register-register ops than memory + simple memory operations. Would the 1:1:1 ratio make simulator miss more register operations..? I thought it is fine to slightly give more odds to register-register ops testing.
tools/list-x86-insns.sh
Outdated
|
|
||
| for i in [a-oq-z]*/*.S p[235]*/*.S | ||
| do | ||
| egrep -v '\.quad|\.word' $i | gcc -E -I ../include -xassembler-with-cpp -DWINDOWS_ABI=0 - >/tmp/source_nodata.S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should -DWINDOWS_ABI=0 be 1? I checked the last /tmp/source_nodata.S (which was p521/p521_jscalarmul_alt.S on my machine) and could confirm that it was not using Windows ABI because it did not have Lp521_jscalarmul_alt_standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that in principle the Windows form might use slightly more instructions (if the save/restore prologue does something different). So yes, I intended to write 1 :-)
| $OBJDUMP -M intel --no-addresses -d /tmp/memory.o | grep '^\s' | sed -E -e 's/^( |\t)*/ /' | sed -E -e 's/( |\t)( |\t).*//' >/tmp/memory_codings | ||
| sed -E -e 's/([0-9a-f][0-9a-f])/0x\1;/g' /tmp/memory_codings | sed -e 's/^ /\[/' | sed -e 's/;$/];/' >/tmp/memory_paste1 | ||
| grep -v '.intel_syntax' /tmp/memory_instructions | sed -E -e 's/(^.*$)/\(\* \1 \*\)/' >/tmp/memory_paste2 | ||
| paste -d ' ' /tmp/memory_paste1 /tmp/memory_paste2 >> "$outfile" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't aware of this nice paste command..!
x86/proofs/simulator.ml
Outdated
| ];; | ||
|
|
||
| let simplemem_iclasses = | ||
| map (fun l -> (fun () -> [l]),true) simple_memory_iclasses;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for simple_memory_iclasses we don't need to make deferred functions because the stack offset is always fixed to 32. :) In the case of mem_iclasses, having the entries as deferred function was useful because it could make the stack offset randomly assigned for each invocation of cosimulate_instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, there is no value in this indirect appraoch here since the instruction is fixed.
|
Yes, perhaps the 1:1:1 ratio gives too much weight to the memory operations? It would be very reasonable to tweak these values. |
| let decoded, result = run_random_simplememopsimulation() in | ||
| decoded,result,false | ||
|
|
||
| let time_limit_sec = 2400.0;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the function run_random_simulations to report number of simplemem instructions tested and its run time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, that's a good idea
x86/proofs/simulator.ml
Outdated
| [ | ||
| (*** these were added manually for extra checks ***) | ||
|
|
||
| [0x66; 0x0f; 0x3a; 0x22; 0x7c; 0x24; 0x07; 0x04]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments for which instruction is being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a somewhat ad-hoc list arising from some instructions I was reviewing for another PR. This should be properly commented.
| if add_assum | ||
| then `aligned 16 (stackpointer:int64):bool`,`additional_assumptions:bool` | ||
| if add_assum = 32 | ||
| then `aligned 32 stackpointer /\ aligned 16 (stackpointer:int64):bool`,`additional_assumptions:bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are both aligned 32 stackpointer and aligned 16 stackpointer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly not, actually. I was thinking it's better to be safe and add more but since the instruction harnesses will be exercising at most one alignment-relevant instruction, it is probably wasteful.
x86/proofs/simulator.ml
Outdated
|
|
||
| let run_random_simulation() = | ||
| if Random.int 100 < 90 then | ||
| let rn = Random.int 99 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have some question about this fraction. I think once we report the number of instructions tested for this new iclass, then we can decide if it is a reasonable fraction or if we should tune it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the number of register and simple memory instructions in the generated x86-insns.ml file, it's almost a factor of 10 difference: 2740 register instructions, 275 memory instructions. So that might suggest something like an 8:1:1 balance as fair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me. We used to have 9:1, and 8:1:1 isn't too far from that.
These points were raised in reviews by @aqjune-aws and @pennyannn.
Be explicit that a couple of the new test cases are nonstandard encodings, i.e. not what assemblers typically produce but still valid in hardware. Also add one such more register case for MOVQ.
|
I believe the further little commits above address all the issues raised :-) |
Issue #, if available:
Description of changes:
This supplements the existing memory operation cosimulation, which has harnesses to handle general relative addresses. We now also cosimulate "simplified" memory operations, which are basically the memory iforms extracted from the code with all relative addresses forced to be [rsp+32]. Collectively, this gives broader coverage of memory operations for additional sanity checking. We now select the kind of simulation with probability 1/3 each
A side-effect of the new process of generation of the x86 instruction list from the code rather than the proofs is that we get a more comprehensive list of register-to-register instructions than before; for example MULX was formerly completely missed (see commit messages for more details).
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.