Initial changes were submitted by Intel. I refactored it to simplify prefix usage in instructions codding (added simd_prefix methods) and VEX encoding was fixed to generate 2bytes prefix when possible. Changes in .ad files were not complete (especially in 32-bit .ad) and were not aggressive as I want. I changed more mach nodes encoding to use macroassembler instructions. Added missing decoding parts in Assembler::locate_operand() and NativeMovRegMem::instruction_start().
Note: no new AVX instructions were added in these changes. And no 3 operands format was added to MacroAssembler. It will be other changes. Destination operand is used as second source in current implementation where applicable.
Float compare implementation in x86_32.ad was replaced with implementation from x86_64.ad. It uses less branches and does not destroy EAX register. Note: ucomiss instruction produces the same result as comiss since we masking numeric exceptions. Also ucomiss could be a little faster since it does not need to check control word for QNaN values.
Vector instructions with VEX prefix use unaligned load for memory operands where with old REX prefix it require 16 bytes alignment. Instructions version with memory operand were added for that but they should be used only with VEX prefix, assert was added. ANDPD and XORPD with memory operand were used before with 16 bytes aligned memory (we have special code to do it). I added assert to check address alignment for these instructions.
As part of these changes REX.W prefix was removed from instructions where it was not needed: MOVDQA, MOVDQU, PCMPESTRI, PSRLQ, PSRLDQ, PTEST.
Tested with UseAVX=1|0, UseSSE=4|2|1|0, CTW, VM regression tests, nsk.