This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16
to extract lower or higher half from a bfloat16x8 vector. The
vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 is just
to return the lower half of a vector register. Tests include both big-
and little-endian cases.
gcc/ChangeLog:
2020-11-03 Dennis Zhang <dennis.zhang@arm.com>
* config/aarch64/aarch64-simd-builtins.def (vget_lo_half): New entry.
(vget_hi_half): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_vget_lo_halfv8bf): New entry.
(aarch64_vget_hi_halfv8bf): Likewise.
* config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic.
(vget_high_bf16): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/aarch64/advsimd-intrinsics/bf16_get.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/bf16_get-be.c: New test.
This makes sure that stack allocated SSA_NAMEs are
at least MODE_ALIGNED. Also increase the MEM_ALIGN
for the corresponding rtl objects.
gcc:
2020-11-03 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR target/97205
* cfgexpand.c (align_local_variable): Make SSA_NAMEs
at least MODE_ALIGNED.
(expand_one_stack_var_at): Increase MEM_ALIGN for SSA_NAMEs.
gcc/testsuite:
2020-11-03 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR target/97205
* gcc.c-torture/compile/pr97205.c: New test.
This patch enables intrinsics to convert BFloat16 scalar and vector
operands to Float32 modes. The intrinsics are implemented by shifting
each BFloat16 item 16 bits to left using shl/shll/shll2 instructions.
gcc/ChangeLog:
2020-11-03 Dennis Zhang <dennis.zhang@arm.com>
* config/aarch64/aarch64-simd-builtins.def(vbfcvt): New entry.
(vbfcvt_high, bfcvt): Likewise.
* config/aarch64/aarch64-simd.md(aarch64_vbfcvt<mode>): New entry.
(aarch64_vbfcvt_highv8bf, aarch64_bfcvtsf): Likewise.
* config/aarch64/arm_bf16.h (vcvtah_f32_bf16): New intrinsic.
* config/aarch64/arm_neon.h (vcvt_f32_bf16): Likewise.
(vcvtq_low_f32_bf16, vcvtq_high_f32_bf16): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
(test_vcvt_f32_bf16, test_vcvtq_low_f32_bf16): New tests.
(test_vcvtq_high_f32_bf16, test_vcvth_f32_bf16): Likewise.