Checking the number of pluses is unreliable since the vector size
isn't known. Instead see that the unwanted scalar compute is not
there.
2021-03-11 Richard Biener <rguenther@suse.de>
PR testsuite/98245
* gcc.dg/vect/bb-slp-46.c: Scan for the scalar compute
instead of verifying the total number of adds.
While we could at least vectorize it on targets which support
re-alignment tokens we fail to do this because of imperfections in
alignment analysis. XFAIL when the HW cannot deal with misaligned
vector accesses for now.
2021-03-11 Richard Biener <rguenther@suse.de>
PR testsuite/97494
* gcc.dg/vect/pr97428.c: XFAIL on !vect_hw_misalign.
This is a missed optimization due to bogus alignment analysis.
2021-03-11 Richard Biener <rguenther@suse.de>
PR testsuite/97494
* gcc.dg/vect/vect-complex-5.c: XFAIL on !vect_hw_misalign.
As reported in the PR all powerpc64 targets fail
FAIL: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
because like on arm we now vectorize 4 opportunities. This adjusts
the testcase to follow the arm example.
2021-03-11 Richard Biener <rguenther@suse.de>
PR testsuite/97494
* gcc.dg/vect/slp-21.c: Adjust for powerpc64*-*-*.
This makes sure to dump SSA names without identifier in the
declaration part of a function dump. While we dump the
anonymous variable decls the SSA names referencing them appear
without a clear reference as to what anonymous variable is used
(_3 vs. D.1234).
2021-03-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/99523
* tree-cfg.c (dump_function_to_file): Dump SSA names
w/o identifier to the decls section as well, not only those
without a VAR_DECL.
The following testcase is miscompiled, because IPA-ICF considers the two
functions identical. They aren't, the types of the .VEC_CONVERT call
lhs is different. But for calls to internal functions, there is no
fntype nor callee with a function type to compare, so all we compare
is just the ifn, arguments and some call flags.
The following patch fixes it by checking the internal fn calls like e.g. gimple
assignments where the type of the lhs is checked too.
2021-03-11 Jakub Jelinek <jakub@redhat.com>
PR ipa/99517
* ipa-icf-gimple.c (func_checker::compare_gimple_call): For internal
function calls with lhs fail if the lhs don't have compatible types.
* gcc.target/i386/avx2-pr99517-1.c: New test.
* gcc.target/i386/avx2-pr99517-2.c: New test.
Beware, tm.texi doesn't tell the whole story: a defined
HARD_FRAME_POINTER_REGNUM (different to FRAME_POINTER_REGNUM) is
supposed to make work easier for reload, being able to easily
tell actual frame-pointer-related addresses from those that
happen to use the same register or something to that effect.
On reasonable code the performance effect is barely measurable.
Looking at libgcc changes for -march=v10, the effect (where
noticeable) is mostly indeterminate churn. Instances where it's
not just insns moved around at no obvious effect: one more insn
for addvdi3, subvdi3; two insns more in floatdisf; three insns
shorter fixunsdfdi. Some of those seem related to pairing r8
with r9. The only effect on coremark is an infinitesimal
positive effect from a three(!) cycles total (from the 15 calls)
faster execution paths in vfprintf_r. Local microbenchmarks
give similar results. With that in mind and not forgetting that
expectations in the register allocator and reload leaning
towards HARD_FRAME_POINTER_REGNUM defined (and different to)
FRAME_POINTER_REGNUM or to wit, "all the kids do it", why not.
Note that the offset at elimination really is 0.
gcc:
* config/cris/cris.h (HARD_FRAME_POINTER_REGNUM): Define.
Change FRAME_POINTER_REGNUM to correspond to a new faked
register faked_fp, part of GENNONACR_REGS like faked_ap.
(CRIS_FAKED_REGS_CONTENTS): New helper macro.
(FIRST_PSEUDO_REGISTER, FIXED_REGISTERS, CALL_USED_REGISTERS):
(REG_ALLOC_ORDER, REG_CLASS_CONTENTS, REGNO_OK_FOR_BASE_P)
(ELIMINABLE_REGS, REGISTER_NAMES): Adjust accordingly.
* config/cris/cris.md (CRIS_FP_REGNUM): Renumber to new faked
register.
(CRIS_REAL_FP_REGNUM): New constant.
* config/cris/cris.c (cris_reg_saved_in_regsave_area): Check
for HARD_FRAME_POINTER_REGNUM instead of FRAME_POINTER_REGNUM.
(cris_initial_elimination_offset): Handle elimination changes
to HARD_FRAME_POINTER_REGNUM instead of FRAME_POINTER_REGNUM
and add one from FRAME_POINTER_REGNUM to
HARD_FRAME_POINTER_REGNUM.
(cris_expand_prologue, cris_expand_epilogue): Emit code for
hard_frame_pointer_rtx instead of frame_pointer_rtx.
AIX word-aligns floating point doubles. This behavior also extends to
double _Complex, which had been overlooked when compiler support for
double _Complex was added.
This patch adds DCmode to the modes whose alignment is adjusted and
adds a testcase to confirm the correct alignment.
gcc/ChangeLog:
2021-03-10 David Edelsohn <dje.gcc@gmail.com>
PR target/99492
* config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Add check for DCmode.
* config/rs6000/rs6000.c (rs6000_special_round_type_align): Same.
gcc/testsuite/ChangeLog:
2021-03-10 David Edelsohn <dje.gcc@gmail.com>
PR target/99492
* gcc.target/powerpc/pr99492.c: New testcase.
A character variable appearing as a data statement object cannot
be automatic, thus it shall have constant length.
gcc/fortran/ChangeLog:
PR fortran/99205
* data.c (gfc_assign_data_value): Reject non-constant character
length for lvalue.
* trans-array.c (gfc_conv_array_initializer): Restrict loop to
elements which are defined to avoid NULL pointer dereference.
gcc/testsuite/ChangeLog:
PR fortran/99205
* gfortran.dg/data_char_4.f90: New test.
* gfortran.dg/data_char_5.f90: New test.
Using CONSTRAINT__UNKNOWN was a bad idea, although it triggered a lot
hidden bugs. It is better to use X instead of empty constraint.
gcc/ChangeLog:
PR target/99422
* lra-constraints.c (process_address_1): Don't check unknown
constraint, use X for empty constraint.
It needs the int128 selector because it uses __int128, and the lp64
selector is the best we can do for -mcmodel=.
2021-03-10 Segher Boessenkool <segher@kernel.crashing.org>
gcc/testsuite/
* gcc.target/powerpc/pr98959.c: Add int128 and lp64 selectors.
My reworking of pending-entity loading introduced a GC problem. The
post-load processing needs to inhibit GCs (that would otherwise occur
in clone_decl). That wasn't happening on one code path, leading to
dangling pointers in the active call frames.
PR c++/99423
gcc/cp/
* module.cc (post_load_processing): Assert not gcable.
(laxy_load_pendings): Extend no-gc region around
post_load_processing.
gcc/testsuite/
* g++.dg/modules/pr99423_a.H: New.
* g++.dg/modules/pr99423_b.H: New.
As preparatory work for a fix to PR analyzer/96374, this patch
moves the core state-update logic from the loop in
exploded_path::feasible_p into a new class feasibility_state.
No functional change intended.
gcc/analyzer/ChangeLog:
PR analyzer/96374
* engine.cc (exploded_path::feasible_p): Move "snodes_visited" and
"model" locals into a new class feasibility_state. Move heart
of per-edge processing into
feasibility_state::maybe_update_for_edge.
(feasibility_state::feasibility_state): New.
(feasibility_state::maybe_update_for_edge): New, based on loop
body in exploded_path::feasible_p.
* exploded-graph.h (class feasibility_state): New.
Implement dyn_cast_callgraph_superedge once in callgraph_superedge,
rather than twice in the two subclasses.
Spotted whilst working on a patch for call summaries.
gcc/analyzer/ChangeLog:
* supergraph.h
(callgraph_superedge::dyn_cast_callgraph_superedge): New.
(call_superedge::dyn_cast_callgraph_superedge): Delete.
(return_superedge::dyn_cast_callgraph_superedge): Delete.
On unsigned_char targets, the cast stmt to unsigned char is obviously
not needed (and shouldn't be there). But it doesn't hurt to test
the rest also on targets where char is unsigned.
2021-03-10 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/99305
PR testsuite/99498
* g++.dg/opt/pr99305.C: Don't expect cast to unsigned char on
unsigned_char effective targets.
This is another place where our one-true-decl representation breaks
down. The fix here propagates the assembly name to the ns-scope
alias. that fixes the reported problem but changes the behaviour when
the user has explicitly declared the entity in its namespace.
However, we didn't handle that case 'correctly' anyway before.
Previously we'd also ignore the explicitly specified assembler name,
now we propagate it. It's not clear to me what the desired semantics
would be in decorating just one of the local extern declarations this
way. I don't think we can really do better without propagating this
aliasing property into the middle end (which is also needed for some
constexpr handling, see PR97306). I tried that before and it turned
into a rat-hole.
PR c++/99508
gcc/cp/
* decl.c (make_rtl_for_nonlocal_decl): Propagate local-extern's
assembler name to the ns alias.
gcc/testsuite/
* g++.dg/ext/pr99508.C: New.
This adds missing includes to internal library headers which get
included from more than one other header, so that they can be compiled
as a stand-alone header unit.
For existing use cases these includes are no-ops because they're already
done by the header that includes these files. For compiling them as a
header unit, this ensures that they include what they use.
libstdc++-v3/ChangeLog:
PR libstdc++/99413
* include/bits/align.h: Include debug/assertions.h.
* include/bits/codecvt.h: Include bits/c++config.h.
* include/bits/enable_special_members.h: Likewise.
* include/bits/erase_if.h: Likewise.
* include/bits/functional_hash.h: Include <type_traits>.
* include/bits/invoke.h: Include bits/move.h.
* include/bits/ostream_insert.h: Include bits/exception_defines.h.
* include/bits/parse_numbers.h: Include <type_traits>.
* include/bits/predefined_ops.h: Include bits/c++config.h.
* include/bits/range_access.h: Include bits/stl_iterator.h.
* include/bits/stl_bvector.h: Do not include bits/stl_vector.h.
* include/bits/stl_iterator.h: Include bits/stl_iterator_base_types.h.
* include/bits/stl_uninitialized.h: Include bits/stl_algobase.h.
* include/bits/uniform_int_dist.h: Include bits/concept_check.h.
* include/bits/unique_lock.h: Include bits/std_mutex.h.
* include/debug/assertions.h: Include bits/c++config.h.
The proposed resolution for this library issue simplifies the
constraints for compare_three_way, ranges::equal_to, ranges::less etc.
so that they do not work with types which are convertible to pointers
but which fail to meet the usual syntactic requirements for the
comparisons.
This affects the example in PR libstdc++/93628 but doesn't fix the
problem described in that report.
libstdc++-v3/ChangeLog:
* include/bits/ranges_cmp.h (__eq_builtin_ptr_cmp): Remove.
(ranges::equal_to, ranges::not_equal_to): Do not constrain
with __eq_builtin_ptr_cmp.
(ranges::less, ranges::greater, ranges::less_equal)
(ranges::greater_equal): Do not constrain with
__less_builtin_ptr_cmp.
* libsupc++/compare (compare_three_way): Do not constrain with
__3way_builtin_ptr_cmp.
* testsuite/18_support/comparisons/object/builtin-ptr-three-way.cc: Moved to...
* testsuite/18_support/comparisons/object/lwg3530.cc: ...here.
* testsuite/20_util/function_objects/range.cmp/lwg3530.cc: New test.
This fixes a typo in the description of
aarch64_vfp_is_call_or_return_candidate.
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_vfp_is_call_or_return_candidate):
Fix typo in comment describing "is_ha" argument.
A couple of analyzer testcases no longer have state explosions; updating
them accordingly in case they regress.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/pr94047.c: Remove "-Wno-analyzer-too-complex".
* gcc.dg/analyzer/zlib-2.c: Likewise.
The target selector should explicitly choose 256 bit hardware as
explicit 256 bit compiler options are used to trigger the bug.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr99102.c: Fix target selector.
Previously, IFN_MASK_SCATTER_STORE was used if 'loop_masks' was
non-null, but the mask used is 'final_mask'. This caused a bug where
a 'MASK_STORE' was vectorized into a 'SCATTER_STORE' instead of a
'MASK_SCATTER_STORE'. This fixes PR target/99102.
gcc/ChangeLog:
PR target/99102
* tree-vect-stmts.c (vectorizable_store): Fix scatter store mask
check condition.
(vectorizable_load): Fix gather load mask check condition.
gcc/testsuite/ChangeLog:
PR target/99102
* gcc.dg/vect/pr99102.c: New test.
The fix for PR94775 added more strict checking for type reuse
to check_aligned_type, specifically matching TYPE_USER_ALIGN.
But then build_aligned_type sets TYPE_USER_ALIGN on the built
variant so if the type we build an aligned variant for does not
have TYPE_USER_ALIGN we'll never re-use the newly created aligned
variant. This results in ~35000 identical variants being created
for polyhedron doduc.
The following instead checks that the candidate has TYPE_USER_ALIGN set.
2021-03-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/99510
* tree.c (check_aligned_type): Check that the candidate
has TYPE_USER_ALIGN set instead of matching with the
original type.
The code in build_round_expr implicitly assumes that __float128 exists,
which is *not* the common case among 64-bit architectures since the
"long double" type is generally already 128-bit for them.
gcc/fortran/
PR fortran/96983
* trans-intrinsic.c (build_round_expr): Do not implicitly assume
that __float128 is the 128-bit floating-point type.
This is a strange regression whereby an enumeration type declared as
atomic (or volatile) incorrectly triggers the ODR machinery for its
values in LTO mode.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Build a TYPE_STUB_DECL
for the main variant of an enumeration type declared as volatile.
gcc/testsuite/
* gnat.dg/specs/lto25.ads: New test.
Returning a REGMODE_NATURAL_SIZE of 4 for DFmode in 64-bit mode is
just asking for trouble because sub-word SUBREGs are always treated
differently than the others, in particular by the register allocator.
gcc/
* config/sparc/sparc.c (sparc_regmode_natural_size): Return 4 for
float and vector integer modes only if the mode is not larger.
When DWARF_FRAME_REGISTERS isn't defined, the default is
FIRST_PSEUDO_REGISTER which means that if you add faked
registers to the port, used for frame-context related
elimination, room is allocated for them in the register
context used for frame-unwinding, which is wasteful because
they're eliminated before the final form of the code that is
emitted.
Stopping after MOF saves two register slots in the unwind
contest, compared to the current default. For regular C
programming this is uninteresting, but defining
DWARF_FRAME_REGISTERS now also avoids the need to remember
to define it later, when twiddling with additional faked
registers (alternatively suffering churn from comparing
differences in unwind context). As expected, no effect on
test-results, coremark or local (C-specific)
microbenchmarks.
gcc:
* config/cris/cris.h (DWARF_FRAME_REGISTERS): Define.
Fix build errors due to warnings such as:
gcc/config/v850/rtems.h:43: error: "RTEMS_STARTFILE_SPEC" redefined [-Werror]
43 | #define RTEMS_STARTFILE_SPEC ""
The problem was that "gcc/config/rtems.h" was included before the
architecture-specific "gcc/config/*/rtems.h" header file on some
architectures.
gcc/
* config.gcc (aarch64-*-rtems*): Include general rtems.h after
the architecture-specific rtems.h.
(aarch64-*-rtems*): Likewise.
(arm*-*-rtems*): Likewise.
(epiphany-*-rtems*): Likewise.
(riscv*-*-rtems*): Likewise.
Before my PR97690 changes, conditional_replacement would not set neg
when the nonzero arg was boolean true.
I've simplified the testing, so that it first finds the zero argument
and then checks the other argument for all the handled cases
(1, -1 and 1 << X, where the last case is what the patch added support for).
But, unfortunately I've placed the integer_all_onesp test first.
For unsigned precision 1 types such as bool integer_all_onesp, integer_onep
and integer_pow2p can all be true and the code set neg to true in that case,
which is undesirable.
The following patch tests integer_pow2p first (which is trivially true
for integer_onep too and tree_log2 in that case gives shift == 0)
and only if that isn't the case, integer_all_onesp.
2021-03-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/99305
* tree-ssa-phiopt.c (conditional_replacement): Test integer_pow2p
before integer_all_onesp instead of vice versa.
* g++.dg/opt/pr99305.C: New test.
The previous version returned true for all PowerPC. This is incorrect.
We only support floating point square root instructions if a) we support
floating point instructions at all, and b) we have _ARCH_PPCSQ defined.
2020-03-09 Segher Boessenkool <segher@kernel.crashing.org>
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_powerpc_sqrt): New.
(check_effective_target_sqrt_insn): Use it.
gcc/ChangeLog:
PR target/99454
* lra-constraints.c (process_address_1): Process constraint 'g'
separately and digital constraints containing more one digit.
gcc/testsuite/ChangeLog:
PR target/99454
* gcc.target/i386/pr99454.c: New.
The r11-7528 build_co_await changes broke coroutines on arm*-linux-gnuabi,
2780 ^FAIL.*coroutines/ in total.
The problem is that arm is targetm.cxx.cdtor_return_this target where
both ctors and dtors in the ABI return this pointer rather than
void, and build_new_method_call_1 does:
else if (call != error_mark_node
&& DECL_DESTRUCTOR_P (cand->fn)
&& !VOID_TYPE_P (TREE_TYPE (call)))
/* An explicit call of the form "x->~X()" has type
"void". However, on platforms where destructors
return "this" (i.e., those where
targetm.cxx.cdtor_returns_this is true), such calls
will appear to have a return value of pointer type
to the low-level call machinery. We do not want to
change the low-level machinery, since we want to be
able to optimize "delete f()" on such platforms as
"operator delete(~X(f()))" (rather than generating
"t = f(), ~X(t), operator delete (t)"). */
call = build_nop (void_type_node, call);
The new code in build_co_await relies on build_special_member_call
returned expression being a CALL_EXPR, but due to the build_nop
in there it is a NOP_EXPR around the CALL_EXPR. It can't be stripped
with STRIP_NOPS because void has different mode from the pointer mode.
2021-03-09 Jakub Jelinek <jakub@redhat.com>
PR c++/99459
* coroutines.cc (build_co_await): Look through NOP_EXPRs in
build_special_member_call return value to find the CALL_EXPR.
Simplify.
First, gcc.dg/array-quals-1.c does not pass if the compiler is configured
with --enable-default-pie because the sections change, so force -fno-pie.
Second, replace *-*-solaris* with sparc*-*-* for gfortran.dg/pr95690.f90
because this depends on the architecture rather than the OS. Third force
SRA to trigger on Aarch64 (like PowerPC) for gnat.dg/opt39.adb.
gcc/testsuite/
* gcc.dg/array-quals-1.c: Pass -fno-pie if supported.
* gcc.dg/loop-9.c: Likewise.
* gfortran.dg/pr95690.f90: Replace *-*-solaris* with sparc*-*-*.
* gnat.dg/opt39.adb: Pass --param option for Aarch64 too.
This boils down to the RTL expander trying to take the address of a DECL
whose RTX is a register.
gcc/
PR c++/90448
* calls.c (initialize_argument_information): When the argument
is passed by reference, do not make a copy in a thunk only if
the argument is already in memory. Remove redundant test for
the case of callee copy.
We need to process 0..9 constraints to fetch the right op constraint in
the function. Also 0..9 constraints gives unknown class constraint
class which can result in skipping address normalization for memory in asm.
gcc/ChangeLog:
PR target/99454
* lra-constraints.c (process_address_1): Process 0..9 constraints
in process_address_1.
Not all OSes have regex.h and not all OSes that do have REG_STARTEND macro support.
Conditionalize the test on that.
2021-03-09 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/98920
* c-c++-common/asan/pr98920.c: Only include regex.h if the header
exists. If REG_STARTEND macro isn't defined, just return 0 from main
instead of the actual test.
This clarifies that c++2[03] intentionally does not enable
c++20 modules.
PR c++/99472
gcc/cp/
* parser.c (cp_parser_diagnose_invalid_type_name): Clarify
that C++20 does not yet imply modules.