For VEX-encoded ones, all three involved vector registers have to be
distinct. For EVEX-encoded ones an actual mask register has to be in use
and zeroing-masking cannot be used (violation of either will #UD).
Additionally both involved vector registers have to be distinct for
EVEX-encoded gathers.
Commit 6ff00b5e12 ("x86/Intel: correct permitted operand sizes for
AVX512 scatter/gather") brought the assembler side of AVX512 S/G insn
handling in line with AVX2's, but the disassembler side was forgotten.
This has the benefit of
- allowing to fold a number of table entries,
- rendering a few #define-s and enumerators unused.
When the VEX.L=1 decode matches that of both EVEX.L'L=1 and EVEX.L'L=2
(typically when all three are invalid) the (smaller) VEX table entry can
be reused by EVEX, instead of duplicating data. (Note that XM and XMM as
well as EXxmm_md and EXd are equivalent at least for the purposes here.)
The order of decodes influences the overall number of table entries.
Reduce table size quite a bit by first decoding few-alternatives
attributes common to all valid leaves.
This also adds a PREFIX_DATA 7531c61332 ("x86: simplify decode of
opcodes valid with (embedded) 66 prefix only") missed to apply to
vbroadcastf64x4.
Rdq, Rd, and MaskR can be replaced by Edq, Ed / Rm, and MaskE
respectively, as OP_R() doesn't enforce ModRM.mod == 3, and hence where
MOD matters but hasn't been decoded yet it needs to be anyway. (The case
of converting to Rm is temporary until a subsequent change.)
The only valid (embedded or explicit) prefix being the data size one
(which is a fairly common pattern), avoid going through prefix_table[].
Instead extend the "required prefix" logic to also handle PREFIX_DATA
alone in a table entry, now used to identify this case. This requires
moving the (adjusted) ->prefix_requirement logic ahead of the printing
of stray prefixes, as the latter needs to observe the new setting of
PREFIX_DATA in used_prefixes.
Also add PREFIX_OPCODE on related entries when previously there was
mistakenly no decode step through prefix_table[].
Unlike for the EVEX-encoded versions, the VEX ones failed to decode
VEX.W. Once the necessary adjustments are done, it becomes obvious that
the EVEX and VEX table entries for VCVTPS2PH are identical and can hence
be folded.
The duplication is not only space inefficient, but also risks entries
going out of sync (some of which that I became aware of while doing this
work will get addressed subsequently). Right here note that for
VGF2P8MULB this also addresses the prior lack of EVEX.W decoding (i.e. a
first example of out of sync entries).
This introduces EXxEVexR to some VEX templates, on the basis that this
operand is benign there and only relevant when EVEX encoding ends up
reaching these entries.
For major opcodes allowing only packed FP kinds of operands, i.e. the
ones where legacy and AVX decoding uses the X macro, we can do so for
AVX512 as well, by attaching to the checking logic the "EVEX.W must
match presence of embedded 66 prefix" rule. (Encodings not following
this general pattern simply may not gain the PREFIX_OPCODE attribute.)
Note that testing of the thus altered decoding has already been put in
place by "x86: correct decoding of packed-FP-only AVX encodings".
This can also be at least partly applied to scalar-FP-only insns (i.e.
V{,U}COMIS{S,D}) as well as the vector-FP forms of insns also allowing
scalar encodings (e.g. VADDP{S,D}).
Take the opportunity and also fix EVEX-encoded VMOVNTP{S,D} as well as
to-memory forms of VMOV{L,H}PS and both forms of VMOV{L,H}PD to wrongly
disassemble with only register operands.
Break i386-dis-evex.h into small files such that each file is included
just once.
* i386-dis-evex.h: Break into ...
* i386-dis-evex-len.h: New file.
* i386-dis-evex-mod.h: Likewise.
* i386-dis-evex-prefix.h: Likewise.
* i386-dis-evex-reg.h: Likewise.
* i386-dis-evex-w.h: Likewise.
* i386-dis.c: Include i386-dis-evex-reg.h, i386-dis-evex-prefix.h,
i386-dis-evex.h, i386-dis-evex-len.h, i386-dis-evex-w.h and
i386-dis-evex-mod.h.
For the flavors having a GPR operand EVEX.W is ignored outside of 64-bit
mode. The mnemonic should therefore not be KMOVQ, the GPR operand should
not name a non-existing 64-bit register, just like is already the case
for the AVX counterparts, and the Disp8 scaling factor should be 4
rather than 8.
PEXTR{B,W} and PINSR{B,W}, just like for AVX512BW, are WIG, no matter
that the SDM uses a nonstandard description of that fact.
PEXTRD, even with EVEX.W set, ignores that bit outside of 64-bit mode,
just like its AVX counterpart.
Update x86 disassembler to ignore the EVEX.W bit in EVEX vcvt[u]si2s[sd]
instructions in 32-bit mode.
gas/
PR binutils/23655
* testsuite/gas/i386/evex.d: New file.
* testsuite/gas/i386/evex.s: Likewise.
* testsuite/gas/i386/i386.exp: Run evex.
opcodes/
PR binutils/23655
* i386-dis-evex.h (evex_table): Replace Eq with Edqa for
vcvtsi2ss%LQ, vcvtsi2sd%LQ, vcvtusi2ss%LQ and vcvtusi2sd%LQ.
* i386-dis.c (Edqa): New.
(dqa_mode): Likewise.
(intel_operand_size): Handle dqa_mode as m_mode.
(OP_E_register): Handle dqa_mode as dq_mode.
(OP_E_memory): Set shift for dqa_mode based on address_mode.
Just like for their AVX counterparts and CVTSI2S{D,S}, a memory source
here is ambiguous and hence
- in source files should be qualified with a suitable suffix or operand
size specifier (not doing so is an error in Intel mode, and will gain
a diagnostic in AT&T mode in the future),
- in disassembly should be properly suffixed (the Intel operand size
specifiers were emitted correctly already).
Certain conversion operations as well as vfpclassp{d,s} are ambiguous
when the input operand is in memory and no broadcast is being used.
While in Intel mode this gets resolved by printing suitable operand
size modifiers, AT&T mode need mnemonic suffixes to be added.
gas/testsuite/
2015-04-23 Jan Beulich <jbeulich@suse.com>
* gas/i386/avx512dq.d: Add 'z' suffix to vfpclassp{d,s} non-
register, non-broadcast cases.
* gas/i386/x86-64-avx512dq.d: Likewise.
* gas/i386/avx512dq_vl.d: Add 'x' and 'y' suffixes to
vcvt{,u}qq2ps and vfpclassp{d,s} non-register, non-broadcast
cases.
* gas/i386/x86-64-avx512dq_vl.d: Likewise.
* gas/i386/avx512f_vl.d: Add 'x' and 'y' suffixes to
vcvt{,t}pd2{,u}dq and vcvtpd2ps non-register, non-broadcast
cases.
* gas/i386/x86-64-avx512f_vl.d: Likewise.
opcodes/
2015-04-23 Jan Beulich <jbeulich@suse.com>
* i386-dis.c (putop): Extend "XY" handling to AVX512. Handle "XZ".
* i386-dis-evex.h.c (vcvtpd2ps, vcvtqq2ps, vcvttpd2udq,
vcvtpd2udq, vcvtuqq2ps, vcvttpd2dq, vcvtpd2dq): Add %XY.
(vfpclasspd, vfpclassps): Add %XZ.
For gathers with indices larger than elements (e. g.)
vpgatherqd ymm6{k1}, ZMMWORD PTR [ebp+zmm7*8-123]
We currently treat memory size as a size of index register, while it is
actually should be size of destination register:
vpgatherqd ymm6{k1}, YMMWORD PTR [ebp+zmm7*8-123]
This patch fixes it.
opcodes/
* i386-opc.tbl: Change memory size for vgatherpf0qps, vgatherpf1qps,
vscatterpf0qps, vscatterpf1qps, vgatherqps, vpgatherqd, vpscatterqd,
vscatterqps.
* i386-tbl.h: Regenerate.
gas/testsuite/
* gas/i386/avx512pf-intel.d: Change memory size for vgatherpf0qps,
vgatherpf1qps, vscatterpf0qps, vscatterpf1qps.
* gas/i386/avx512pf.s: Ditto.
* gas/i386/x86-64-avx512pf-intel.d: Ditto.
* gas/i386/x86-64-avx512pf.s: Ditto.
* gas/i386/avx512f-intel.d: Change memory size for vgatherqps,
vpgatherqd, vpscatterqd, vscatterqps.
* gas/i386/avx512f.s: Ditto.
* gas/i386/x86-64-avx512f-intel.d: Ditto.
* gas/i386/x86-64-avx512f.s: Ditto.