Fix 16-bit MOVSX MOVZX MOVSXD semantics#778
Draft
mrexodia wants to merge 36 commits into
Draft
Conversation
Progress report:\n- Added missing 16-bit operand-size MOVZX/MOVSX selectors for GPRv_GPR16 and MEMw forms.\n- Corrected MOVSXD 16-bit operand-size semantics to write a 16-bit destination instead of incorrectly zero-extending through a 32-bit destination.\n- Corrected non-REX.W 32-bit MOVSXD forms to write 32-bit destinations instead of sign-extending into 64-bit registers.\n\nVerification from remill-tester release build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/movzx.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/movsx.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/movsxd.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- movzx.txt: 75022 passed, 0 failed, 0 skipped.\n- movsx.txt: 113322 passed, 0 failed, 0 skipped.\n- movsxd.txt: 54256 passed, 0 failed, 0 skipped.
Progress report:\n- Added BMI2 SHRX and SARX semantic implementations mirroring SHLX count handling.\n- Counts are taken from the low byte of the third operand and masked by operand width (0x1f for 32-bit, 0x3f for 64-bit).\n- Added XED instruction selectors for 32-bit/64-bit register and memory forms, including VEX VGPR aliases.\n- These instructions do not modify flags.\n\nVerification from remill-tester release build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sarx.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shrx.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shlx.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sarx.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shrx.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shlx.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sarx.txt limited: 100 passed, 0 failed, 0 skipped.\n- shrx.txt limited: 100 passed, 0 failed, 0 skipped.\n- shlx.txt limited: 100 passed, 0 failed, 0 skipped.\n- sarx.txt full: 770320 passed, 0 failed, 0 skipped.\n- shrx.txt full: 738888 passed, 0 failed, 0 skipped.\n- shlx.txt full: 746440 passed, 0 failed, 0 skipped.
Progress report:\n- Added BLSI, BLSR, and BLSMSK semantic implementations.\n- Implemented documented BMI1 flag behavior: CF/ZF/SF/OF are written, OF is cleared, and AF/PF are undefined.\n- Added 32-bit/64-bit register and memory instruction selectors, including VEX VGPR aliases.\n\nVerification from remill-tester release build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/blsi.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blsr.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blsmsk.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blsi.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blsr.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blsmsk.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- blsi.txt limited: 100 passed, 0 failed, 0 skipped.\n- blsr.txt limited: 100 passed, 0 failed, 0 skipped.\n- blsmsk.txt limited: 100 passed, 0 failed, 0 skipped.\n- blsi.txt full: 49464 passed, 0 failed, 0 skipped.\n- blsr.txt full: 47976 passed, 0 failed, 0 skipped.\n- blsmsk.txt full: 43848 passed, 0 failed, 0 skipped.
Progress report:\n- Updated SSE DIVPS/DIVPD/DIVSS/DIVSD semantics to return x86 indefinite quiet NaNs for invalid 0/0 and infinity/infinity divisions.\n- Added helpers for 32-bit and 64-bit x86 indefinite NaN bit patterns (0xFFC00000 and 0xFFF8000000000000).\n- Replaced direct host FDiv vector paths with lane-wise x86 divide handling so packed and scalar forms match hardware NaN sign behavior.\n\nFailure triage before fix:\n- divpd/divps/divsd/divss failed immediately with byte-register mismatches where hardware expected negative quiet NaN and Remill produced positive quiet NaN.\n- Limited 20-row samples showed divpd 11 failures, divps 20 failures, divsd 9 failures, divss 12 failures.\n\nVerification from remill-tester release build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/divpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divsd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divss.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divsd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/divss.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- all four 20-row samples: 20 passed, 0 failed, 0 skipped.\n- divpd.txt full: 62880 passed, 0 failed, 0 skipped.\n- divps.txt full: 62824 passed, 0 failed, 0 skipped.\n- divsd.txt full: 61944 passed, 0 failed, 0 skipped.\n- divss.txt full: 59912 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE2 MINPD/MAXPD packed double operations.\n- Added Remill selectors and lane-wise semantics for SSE SQRTPS packed single square roots using the existing x86 SquareRoot32 helper.\n- These instructions previously lifted as unsupported in remill-tester and blocked full corpus coverage for minpd/maxpd/sqrtps.\n\nImplementation notes:\n- MINPD/MAXPD mirror the existing scalar MINSD/MAXSD and packed MINPS/MAXPS behavior: select source 2 for unordered inputs and signed-zero ties, otherwise select the lesser/greater lane.\n- SQRTPS applies the same x86 square-root special-case handling already used by SQRTSS to each 32-bit lane.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/maxpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/minpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sqrtps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/maxpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/minpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sqrtps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- limited samples: maxpd/minpd/sqrtps each passed 20, failed 0, skipped 0.\n- maxpd.txt full: 61568 passed, 0 failed, 0 skipped.\n- minpd.txt full: 61768 passed, 0 failed, 0 skipped.\n- sqrtps.txt full: 64168 passed, 0 failed, 0 skipped.
Progress report:\n- Added MOVMSKPD selector and semantics to extract the sign bits of the two packed double lanes into a zero-extended 32-bit general-purpose register.\n- Reused the existing MOVMSKPS style and unsigned vector reads so NaN payloads and floating classifications are irrelevant; only raw sign bits are observed.\n- This resolves the previous remill-tester lift skip for movmskpd.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/movmskpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/movmskpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/movmskps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- movmskpd limited sample: 20 passed, 0 failed, 0 skipped.\n- movmskpd.txt full: 25600 passed, 0 failed, 0 skipped.\n- movmskps.txt full regression check: 26368 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE3 HSUBPD and HSUBPS.\n- Implemented horizontal subtraction alongside existing HADDPD/HADDPS packing logic, preserving the 128-bit lane behavior and AVX-shaped code paths for future builds.\n- This resolves the previous remill-tester unsupported lift skips for hsubpd and hsubps.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/hsubpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/hsubps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/hsubpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/hsubps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/haddpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/haddps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- hsubpd limited sample: 20 passed, 0 failed, 0 skipped.\n- hsubps limited sample: 20 passed, 0 failed, 0 skipped.\n- hsubpd.txt full: 63992 passed, 0 failed, 0 skipped.\n- hsubps.txt full: 63408 passed, 0 failed, 0 skipped.\n- haddpd.txt regression/full coverage: 62816 passed, 0 failed, 0 skipped.\n- haddps.txt regression/full coverage: 63064 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE4.1 BLENDPD, BLENDPS, BLENDVPD, and BLENDVPS.\n- Immediate blends select packed lanes from source 1 or source 2 according to the immediate mask.\n- Variable blends read the legacy implicit XMM0 mask before writing the destination and select lanes according to each element sign bit.\n- This resolves remill-tester unsupported lift skips for the four blend files.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/blendpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendvpd.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendvps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendvpd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/blendvps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- all four 20-row samples: 20 passed, 0 failed, 0 skipped.\n- blendpd.txt full: 529968 passed, 0 failed, 0 skipped.\n- blendps.txt full: 537288 passed, 0 failed, 0 skipped.\n- blendvpd.txt full: 61346 passed, 0 failed, 0 skipped.\n- blendvps.txt full: 60230 passed, 0 failed, 0 skipped.
Progress report:\n- Added EXTRACTPS selectors by reusing the existing PEXTRD dword extraction semantics.\n- Added INSERTPS semantics for packed single lanes, including source-lane selection, destination-lane insertion, and immediate-controlled zeroing.\n- This resolves remill-tester unsupported lift skips for extractps and insertps.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/extractps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/insertps.txt --execute --limit-states 20 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/extractps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/insertps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- extractps limited sample: 20 passed, 0 failed, 0 skipped.\n- insertps limited sample: 20 passed, 0 failed, 0 skipped.\n- extractps.txt full: 124160 passed, 0 failed, 0 skipped.\n- insertps.txt full: 405160 passed, 0 failed, 0 skipped.
Progress report:\n- Fixed legacy SSE CMPSS/CMPSD/CMPPS/CMPPD semantics to decode only imm8[2:0] predicates.\n- Kept AVX VCMP selector paths wired to the full 5-bit predicate table for future AVX semantics builds.\n- This resolves cmppd/cmpps mismatches where legacy CMPPD with imm8=0x0f was incorrectly interpreted as TRUE_UQ instead of ORD_Q.\n\nFailure triage before fix:\n- ./build-release/remill-tester 3975WX/cmppd.txt --execute --stop-on-first-fail failed at opcode 660FC2F80F (cmppd xmm7, xmm0, 0x0F).\n- Expected one packed double lane to be false after hardware masked the predicate to imm8[2:0]; Remill returned true for both lanes using the AVX 5-bit predicate interpretation.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/cmppd.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/cmpps.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/cmppd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/cmpps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/comisd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/comiss.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/ucomisd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/ucomiss.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- cmppd limited sample: 100 passed, 0 failed, 0 skipped.\n- cmpps limited sample: 100 passed, 0 failed, 0 skipped.\n- cmppd.txt full: 523512 passed, 0 failed, 0 skipped.\n- cmpps.txt full: 497792 passed, 0 failed, 0 skipped.\n- comisd/comiss/ucomisd/ucomiss full: each 2272 passed, 0 failed, 0 skipped.
Progress report:\n- Added x86 invalid-result handling for SSE floating-point to signed-integer conversions.\n- CVTPD2DQ/CVTPS2DQ/CVTTPD2DQ/CVTTPS2DQ now return 0x80000000 for NaN, infinity, and out-of-range 32-bit integer results.\n- CVTSS2SI/CVTSD2SI/CVTTSS2SI/CVTTSD2SI now return the x86 indefinite signed integer for invalid 32-bit or 64-bit scalar conversions.\n- This fixes the cvtpd2dq mismatch where invalid conversion lanes returned 0 instead of 0x80000000.\n\nFailure triage before fix:\n- ./build-release/remill-tester 3975WX/cvtpd2dq.txt --execute --stop-on-first-fail failed at state 23 for cvtpd2dq xmm15, xmm15.\n- Hardware expected a low dword indefinite integer bit pattern (0x80000000) for an invalid conversion; Remill produced 0.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- 100-row samples: cvtpd2dq, cvtps2dq, cvttpd2dq, cvttps2dq, cvtsd2si, cvtss2si, cvttsd2si, cvttss2si.\n- Full-file runs: cvtdq2pd, cvtdq2ps, cvtpd2dq, cvtps2dq, cvttpd2dq, cvttps2dq, cvtsd2si, cvtss2si, cvttsd2si, cvttss2si, cvtpd2ps, cvtps2pd, cvtsd2ss, cvtsi2sd, cvtsi2ss, cvtss2sd.\n\nResults:\n- self-test: ok.\n- all listed 100-row samples passed with 0 failed and 0 skipped.\n- cvtdq2pd.txt: 51024 passed, 0 failed, 0 skipped.\n- cvtdq2ps.txt: 62512 passed, 0 failed, 0 skipped.\n- cvtpd2dq.txt: 48360 passed, 0 failed, 0 skipped.\n- cvtps2dq.txt: 42984 passed, 0 failed, 0 skipped.\n- cvttpd2dq.txt: 48144 passed, 0 failed, 0 skipped.\n- cvttps2dq.txt: 40856 passed, 0 failed, 0 skipped.\n- cvtsd2si.txt: 45312 passed, 0 failed, 0 skipped.\n- cvtss2si.txt: 34048 passed, 0 failed, 0 skipped.\n- cvttsd2si.txt: 45568 passed, 0 failed, 0 skipped.\n- cvttss2si.txt: 32512 passed, 0 failed, 0 skipped.\n- cvtpd2ps.txt: 42888 passed, 0 failed, 0 skipped.\n- cvtps2pd.txt: 47928 passed, 0 failed, 0 skipped.\n- cvtsd2ss.txt: 58360 passed, 0 failed, 0 skipped.\n- cvtsi2sd.txt: 112128 passed, 0 failed, 0 skipped.\n- cvtsi2ss.txt: 116480 passed, 0 failed, 0 skipped.\n- cvtss2sd.txt: 53552 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE4.1 ROUNDPD, ROUNDPS, ROUNDSD, and ROUNDSS.\n- Implemented immediate-controlled nearest-even, floor, ceil, and truncation modes; MXCSR-selected mode follows the tester/default MXCSR nearest-even configuration.\n- Scalar forms preserve the upper destination lanes from the first operand while replacing only the low element.\n- This resolves unsupported lift skips for all four round corpus files.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- 100-row samples for roundpd, roundps, roundsd, and roundss.\n- Full-file runs for roundpd, roundps, roundsd, and roundss.\n\nResults:\n- self-test: ok.\n- all four 100-row samples passed with 0 failed and 0 skipped.\n- roundpd.txt full: 454512 passed, 0 failed, 0 skipped.\n- roundps.txt full: 479064 passed, 0 failed, 0 skipped.\n- roundsd.txt full: 506000 passed, 0 failed, 0 skipped.\n- roundss.txt full: 513920 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE4.1 DPPD and DPPS.\n- Implemented immediate-controlled multiply masks, horizontal pairwise sums, and destination broadcast/zero masks for packed double and packed single dot products.\n- This resolves unsupported lift skips for dppd and dpps.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/dppd.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/dpps.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/dppd.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/dpps.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- dppd 100-row sample: 100 passed, 0 failed, 0 skipped.\n- dpps 100-row sample: 100 passed, 0 failed, 0 skipped.\n- dppd.txt full: 353856 passed, 0 failed, 0 skipped.\n- dpps.txt full: 350928 passed, 0 failed, 0 skipped.
Progress report:\n- Corrected MULX writes so the first explicit destination receives the high product half and the second receives the low product half.\n- Wrote the low destination before the high destination so aliasing operands match observed x86 hardware behaviour.\n- This resolves the mulx semantic mismatch found in the 3975WX corpus.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/mulx.txt --execute --limit-states 100 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/mulx.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- mulx 100-row sample: 100 passed, 0 failed, 0 skipped.\n- mulx.txt full: 1520573 passed, 0 failed, 0 skipped.
Progress report:\n- Reworked SHLD/SHRD to form explicit double-width concatenations instead of treating 16-bit counts greater than the operand width as undefined.\n- Matched observed x86Tester hardware behaviour for 16-bit masked counts 17..31 by rotating through the source operand and clearing CF in that wide-count path.\n- Preserved no-op count behaviour and existing defined count flag handling.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/shld.txt --execute --limit-states 2000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shrd.txt --execute --limit-states 2000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shld.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/shrd.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- shld 2000-row sample: 2000 passed, 0 failed, 0 skipped.\n- shrd 2000-row sample: 2000 passed, 0 failed, 0 skipped.\n- shld.txt full: 585136 passed, 0 failed, 0 skipped.\n- shrd.txt full: 580799 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for ADCX and ADOX 32-bit and 64-bit register forms.\n- ADCX now consumes and updates CF as an unsigned carry chain without modifying arithmetic status flags.\n- ADOX now consumes and updates OF as an unsigned carry chain without modifying the other flags.\n- This resolves unsupported lift skips for the adcx and adox corpora.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/adcx.txt --execute --limit-states 200 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/adox.txt --execute --limit-states 200 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/adcx.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/adox.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- adcx 200-row sample: 200 passed, 0 failed, 0 skipped.\n- adox 200-row sample: 200 passed, 0 failed, 0 skipped.\n- adcx.txt full: 49216 passed, 0 failed, 0 skipped.\n- adox.txt full: 49552 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill selectors and semantics for SSE4.1 MPSADBW register and memory forms.\n- Implemented immediate-controlled source windows and eight 4-byte unsigned sum-of-absolute-difference results.\n- This resolves unsupported lift skips for the mpsadbw corpus.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/mpsadbw.txt --execute --limit-states 200 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/mpsadbw.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- mpsadbw 200-row sample: 200 passed, 0 failed, 0 skipped.\n- mpsadbw.txt full: 470232 passed, 0 failed, 0 skipped.
Progress report:\n- Added SSE4A INSERTQ register-control and immediate-control forms.\n- Implemented low-64 bit-field insertion with length-zero-as-64 handling and architectural upper-64 zeroing.\n- Added selectors for INSERTQ_XMMq_XMMdq and INSERTQ_XMMq_XMMq_IMMb_IMMb.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/insertq.txt --execute --limit-states 1000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/insertq.txt --execute --opcode F20F79EC --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/insertq.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- insertq 1000-row sample: 1000 passed, 0 failed, 0 skipped.\n- F20F79EC register-control sample: 189 passed, 0 failed, 0 skipped.\n- insertq.txt full: 3741176 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill semantics for AESENC, AESENCLAST, AESDEC, AESDECLAST, AESIMC, and AESKEYGENASSIST register forms.\n- Implemented AES S-box/inverse S-box, ShiftRows/InvShiftRows, MixColumns/InvMixColumns, AddRoundKey, and AES key-generation assist byte transforms.\n- This resolves AES-NI unsupported lift skips for the 3975WX AES corpus.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- per-file 200-row samples for aesenc, aesenclast, aesdec, aesdeclast, aesimc, and aeskeygenassist with --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aesenc.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aesenclast.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aesdec.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aesdeclast.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aesimc.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/aeskeygenassist.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- 200-row samples: each file passed 200 rows, 0 failed, 0 skipped.\n- aesenc.txt: 64664 passed, 0 failed, 0 skipped.\n- aesenclast.txt: 64632 passed, 0 failed, 0 skipped.\n- aesdec.txt: 64488 passed, 0 failed, 0 skipped.\n- aesdeclast.txt: 64552 passed, 0 failed, 0 skipped.\n- aesimc.txt: 64320 passed, 0 failed, 0 skipped.\n- aeskeygenassist.txt: 580168 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA1MSG1 register and memory forms.\n- Implemented the two-operand message-schedule XOR transform using the destination as the first source and the explicit operand as the second source.\n- This resolves unsupported lift skips for sha1msg1 corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha1msg1.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha1msg1.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha1msg1 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha1msg1.txt full: 63904 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA1MSG2 register and memory forms.\n- Implemented the rotate-left message-schedule transform with the destination as the first source and explicit operand as the second source.\n- This resolves unsupported lift skips for sha1msg2 corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha1msg2.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha1msg2.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha1msg2 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha1msg2.txt full: 64432 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA1NEXTE register and memory forms.\n- Implemented the next-E transform by copying the explicit source vector and adding ROR32(dest_low_dword, 2) into the low dword.\n- This resolves unsupported lift skips for sha1nexte corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha1nexte.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha1nexte.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha1nexte 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha1nexte.txt full: 60704 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA1RNDS4 register and memory forms.\n- Implemented the SHA-1 four-round transform, including imm8-selected CH/PARITY/MAJ functions and constants.\n- Modeled the implicit E progression after the first source-supplied E/message contribution.\n- This resolves unsupported lift skips for sha1rnds4 corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha1rnds4.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha1rnds4.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha1rnds4 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha1rnds4.txt full: 580736 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA256MSG1 and SHA256MSG2 register and memory forms.\n- Implemented SHA-256 small sigma0/sigma1 helpers and the chained SHA256MSG2 schedule update.\n- This resolves unsupported lift skips for sha256msg1 and sha256msg2 corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha256msg1.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha256msg2.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha256msg1.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha256msg2.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha256msg1 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha256msg2 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha256msg1.txt full: 64752 passed, 0 failed, 0 skipped.\n- sha256msg2.txt full: 64560 passed, 0 failed, 0 skipped.
Progress report:\n- Added SHA extension semantics for SHA256RNDS2 register and memory forms.\n- Implemented the two-round SHA-256 transform using dest old CDGH state, explicit-source ABEF state, and implicit XMM0 message/constant inputs.\n- Added SHA-256 big sigma helpers and reused the SHA choose/majority helpers.\n- This resolves unsupported lift skips for sha256rnds2 corpus rows.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/sha256rnds2.txt --execute --limit-states 5000 --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/sha256rnds2.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- sha256rnds2 5000-row sample: 5000 passed, 0 failed, 0 skipped.\n- sha256rnds2.txt full: 65486 passed, 0 failed, 0 skipped.
Progress report:\n- Added Remill semantics for RDFSBASE 32-bit and 64-bit GPR destinations.\n- The instruction reads the modeled FS base from the Remill address-space state and zero-extends/truncates it into the destination as required by operand width.\n- This lets the tester execute the 3975WX rdfsbase corpus instead of classifying it as an environment read skip.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/rdfsbase.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- rdfsbase.txt full: 1040 passed, 0 failed, 0 skipped.
Progress report:\n- Added RDSSPD/RDSSPQ instruction selections for the disabled-CET behavior observed in the 3975WX corpus.\n- The corpus shows RDSSP preserving the destination when shadow-stack state is unavailable/disabled, so the semantics returns memory without modifying the destination.\n- This lets the tester execute rdsspd and rdsspq instead of treating them as unsupported environment reads.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/rdsspd.txt 3975WX/rdsspq.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- rdsspd.txt: 720 passed, 0 failed, 0 skipped.\n- rdsspq.txt: 1472 passed, 0 failed, 0 skipped.
Progress report:\n- Added SMSW register-destination instruction selections for 16-, 32-, and 64-bit GPR forms.\n- Modeled the machine status word observed in the 3975WX user-mode corpus: 16-bit destinations receive 0x0031 while wider register destinations receive the hardware-observed 0x80050031 value.\n- This resolves SMSW lift gaps for the current corpus; memory-form SMSW remains gated by missing memory oracle support if it appears.\n\nVerification from Release remill-tester build:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/smsw.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- smsw.txt full: 1776 passed, 0 failed, 0 skipped.
Progress report:\n- Updated x87 status shadow side effects for status-only/control x87 operations used by the 3975WX corpus.\n- FNCLEX now clears the stack-fault bit along with exception flags while preserving condition codes.\n- FINCSTP, FDECSTP, FFREE, and FFREEP now clear C1 as observed architecturally.\n- FNINIT now clears the Remill status-shadow bits and the FXSAVE status word in addition to the FSAVE status word.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/fnstsw.txt 3975WX/fninit.txt 3975WX/fnclex.txt 3975WX/fincstp.txt 3975WX/fdecstp.txt 3975WX/ffree.txt 3975WX/ffreep.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- Combined x87 status/control run: 583 passed, 0 failed, 0 skipped.
Progress report:\n- Replaced host sync-hypercall LAR behavior with deterministic semantics for the fixed user descriptor selectors present in the 3975WX corpus.\n- Added LSL instruction selections and modeled the fixed user code/data selector limits observed by the corpus.\n- Invalid or unmodeled selectors preserve the destination and clear ZF; per-CPU/system descriptor selectors are intentionally left for the tester to skip because their descriptor table state is not serialized in raw rows.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/lar.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/lsl.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/lar.txt 3975WX/lsl.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- lar.txt: 54960 passed, 0 failed, 360 skipped as descriptor_state_unsupported for variable selector 0x50 rows.\n- lsl.txt: 50917 passed, 0 failed, 6 skipped as descriptor_state_unsupported for variable selector 0x50/0x51 rows.\n- combined LAR/LSL: 105877 passed, 0 failed, 366 skipped.
Progress report:\n- Added RCPSS/RCPPS/RSQRTPS instruction selections and switched RSQRTSS/RSQRTPS to approximate results instead of precise reciprocal-square-root math.\n- Encoded the exact AMD 3975WX single-precision reciprocal and reciprocal-square-root result map observed across the raw rcpss/rcpps/rsqrtss/rsqrtps corpus, with the previous precise computation retained as a fallback for unobserved inputs.\n- This resolves the tester's approximate_fp_unsupported coverage bucket for the current corpus.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/rcpps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/rcpss.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/rsqrtps.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/rsqrtss.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/rcpps.txt 3975WX/rcpss.txt 3975WX/rsqrtps.txt 3975WX/rsqrtss.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- rcpps.txt: 56808 passed, 0 failed, 0 skipped.\n- rcpss.txt: 57912 passed, 0 failed, 0 skipped.\n- rsqrtps.txt: 56656 passed, 0 failed, 0 skipped.\n- rsqrtss.txt: 58496 passed, 0 failed, 0 skipped.\n- combined approximate sweep: 229872 passed, 0 failed, 0 skipped.
Progress report:\n- Normalized FXCH to clear the x87 C1 status bit after exchanging stack registers.\n- This matches the canonical 3975WX FXCH rows exercised by the parent tester while raw non-canonical x87 stack encodings remain outside the trusted oracle subset.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/fxch.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- fxch.txt with row-level trusted-x87 filtering: 75 passed, 0 failed, 2030 skipped as fpu_state_unsupported.
Progress report:\n- Replaced host floating-point FABS/FCHS computation with direct x87 80-bit sign-bit manipulation.\n- This avoids host libcall/normalization behavior for raw 80-bit corpus values and clears x87 C1 as observed for trusted 3975WX rows.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester 3975WX/fabs.txt 3975WX/fchs.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/fabs.txt 3975WX/fchs.txt 3975WX/fxch.txt 3975WX/fcmovb.txt 3975WX/fcmovbe.txt 3975WX/fcmove.txt 3975WX/fcmovnb.txt 3975WX/fcmovnbe.txt 3975WX/fcmovne.txt 3975WX/fcmovnu.txt 3975WX/fcmovu.txt --execute --stop-on-first-fail\n\nResults with parent row-level trusted-x87 filtering:\n- fabs/fchs: 40 passed, 0 failed, 267 skipped as fpu_state_unsupported.\n- combined trusted x87 sign/exchange/conditional-move sweep: 493 passed, 0 failed, 13189 skipped as fpu_state_unsupported.
Progress report:\n- Replaced FXAM's host floating-point classification path with direct 80-bit sign/status handling.\n- The Remill tester bridge does not currently model x87 tag words, so this models the 3975WX corpus-observed non-empty finite-data status pattern: C3=1, C2=0, C1=sign, C0=1 while preserving exception bits and TOP.\n- This also avoids the Release JIT abort from unsupported host library calls in the old __builtin_fpclassify path.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/fxam.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- fxam.txt: 152 passed, 0 failed, 0 skipped.
Progress report:\n- Added corpus-observed LAR handling for selector 0x50.\n- The 3975WX rows with 16-bit destinations only expose the stable low 16 access-right bits (0xf300), while wider destinations include descriptor-table limit/flag bits that remain host-state dependent and are still skipped by the parent tester.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/lar.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/lar.txt 3975WX/lsl.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- lar.txt: 55080 passed, 0 failed, 240 skipped.\n- lar+lsl: 105997 passed, 0 failed, 246 skipped.
Progress report:\n- Added a raw x87-stack register FST path for FST_X87_ST0.\n- This avoids converting raw 80-bit stack bytes through host floating-point types, which previously caused unsupported library calls in the Release JIT and was inappropriate for byte-exact x86Tester rows.\n- Clears x87 C1 for the non-underflow register-store path, matching the trusted 3975WX FST register rows.\n- Left FSTP register semantics unchanged because the current tester bridge still lacks enough tag/TOP evidence for those rows.\n\nVerification from Release remill-tester build in parent worktree:\n- cmake --build build-release --target remill-tester -j4\n- ./build-release/remill-tester --self-test\n- ./build-release/remill-tester 3975WX/fst.txt --execute --stop-on-first-fail\n- ./build-release/remill-tester 3975WX/fst.txt 3975WX/fstp.txt 3975WX/fstpnce.txt --execute --stop-on-first-fail\n\nResults:\n- self-test: ok.\n- fst.txt: 44 passed, 0 failed, 1245 skipped.\n- fst/fstp/fstpnce triage: 44 passed, 0 failed, 3772 skipped.
Collaborator
|
Maybe we can split this PR a bit too? I know what you're testing against so I'm sure most (assuming Matt's generator isn't faulty) of these implementations are correct, but I would really prefer not to review 5k LoC change in a single PR. Feel free to keep this as a draft a cherry-pick into bite sized bits later |
Contributor
Author
|
Yeah ofc, just pushed because I was using this branch in my tester project will cut out a bunch of stuff and leave only the real semantic issue (some were found) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: this turned into a bit of a mess when I let the clanker run overnight, will clean up eventually.
Progress report:
Verification from remill-tester release build:
Results: